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ABSTRACT 

We present a measurement of the angular bispectrum of the millimeter- wave sky in observing bands 
centered at roughly 95, 150, and 220 GHz, on angular scales of 1' < 9 < 10' (multipole number 
1000 < I < 10,000). At these frequencies and angular scales, the main contributions to the bispectrum 
are expected to be the thermal Sunyaev-Zel'dovich (tSZ) effect and emission from extragalactic sources, 
predominantly dusty, star-forming galaxies (DSFGs) and active galactic nuclei. We measure the 
bispectrum in 800 deg 2 of three-band South Pole Telescope data, and we use a multi-frequency fitting 
procedure to separate the bispectrum of the tSZ effect from the extragalactic source contribution. 
We simultaneously detect the bispectrum of the tSZ effect at >10cr, the unclustered component of 
the extragalactic source bispectrum at >6cx in each frequency band, and the bispectrum due to the 
clustering of DSFGs — i.e., the clustered cosmic infrared background (CIB) bispectrum — at >5ct. This 
is the first reported detection of the clustered CIB bispectrum. We use the measured tSZ bispectrum 
amplitude, compared to theoretical predictions, to constrain the normalization of the matter power 
spectrum to be a% = 0.786 ± 0.031 and to predict the amplitude of the tSZ power spectrum. This 
prediction improves our ability to separate the thermal and kinematic contributions to the total SZ 
power spectrum. The addition of bispectrum data improves our constraint on the tSZ power spectrum 
amplitude by a factor of two compared to power spectrum measurements alone and provides the first 
evidence of a nonzero kinematic SZ (kSZ) power spectrum, with a derived constraint on the kSZ 
amplitude at I = 3000 of A kSZ = 2.9 ± 1.5 fiK 2 . 

Subject headings: cosmology: observations — cosmology: cosmic background radiation — methods: 
data analysis 
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Measurements of the millimeter-wave sky are a rich 
source of cosmological information. Studies of the inten- 
sity of the cosmic microwave background (CMB) have 
provided much of the evidence for our current cosmo- 
logica l model (e.g., IKomatsu et al.|[2011t iHinshaw et al.l 
\20Wi, and ever more sensitive and wide-ranging ex- 
periments (in terms of both sky coverage and range of 
angular scales probed) continue to i mprove our con- 
straints on cosmological parameters dReichardt et al 
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|20H IHou et alJl2(Hl ISievers et al.l [20l3h . Beyond re- 
vealing the properties of the primary CMB fluctua- 
tions (those generated at the surface of last scatter- 
ing), the high- resolution millimeter- wave sky maps gen- 
erated by current experiments also enable the study of 
secondary anisotropics in the CMB — those due to in- 
teractions of the CMB photons with matter along the 
line of sight — as well as emissive foreground sources. 
These secondary signals also carry interesting cosmo- 
logical informatio n, probing different epochs of cos- 
mic history (e g.. ILueker et all 120101: iDas et al.l [201 lbb 
iReichardt et al.ll2012f) . ~ 

The primary CMB signal arising from fluctuations at 
the surface of last scattering is expected to be very close 
to a Gaussian random field. The simplest models of in- 
flation predict very sm all levels of non-Gaussianity (e.g., 
lAcquaviva et al.1 120031 ). and experimental results up to 
this point have been essentially consis tent with these 
predictions (|Komatsu et al.l [20091 1201 II ). Under the as- 
sumption of Gaussianity, all of the information about 
CMB intensity fluctuations is contained in the second 
moment of the distribution. Hence, the power spectrum 
has historically been the simplest and most useful statis- 
tic for characterizing CMB fluctuations and constraining 
cosmological models. 

Searches for non-G aussianity in the primar y CMB, as 
pointed out in, e.g., iColes fe Barrowl (119871 ). have the 
potential to inform inflationary models. These measure- 
ments involve constructing statistics that test the skew- 
ness of the maps, or, more generally, that depend on the 
three-point angular correlation function or its harmonic- 
space equivalent, the bispectrum. Such statistics can 
be tailored for sensitivity to specific models for infla- 
tionary non-Gaussianity, which are then parameterized 
with a single amplitude /nl- Several analyses of WMAP 
data have produced conflicting constraints on the ampli- 
tude of a particular model — the so-called "local model" — 
ranging from tentative detections to limits consistent 
with /nl local = 0, but always with erro r bars of *5/nt, ~ 
20 - 30 '(e.g., lYadav fe W^rdeRI 120011 iKomatsu et aD 
120 lit ). Analyses of higher-resolution CMB data over 
small patches of the sky have resu lted in upper lim- 
its of order /NL.i ocai < 1000 (e.g., iSantos et all [20021: 
ISmith et al.ll200l . More precise constraints or measure- 
ments of /nl will require a combination of large sky area 
and high sensitivity; limits from Pla nck are expected t o 
be at the level of <5/ N l ~ 5 (e.g., lYadav et all 120071 ). 
Progress will also depend on understanding the non- 
Gaussian behavior of secondary CMB anisotropies and 
foregrounds. 

Secondary CMB anisotr opie s include gravitational 
lensing of the CMB (e.g., iHul I2000D and the thermal 
Suny aev-Zel'dovich (SZ) effect (jSunvaev fc Zel'dovichl 
[1970 ). These signals are non-Gaussian in general, al- 
though they are often studied through their effect on the 
power spectrum. For example, detections of the effect 
of gravitational lensing in CMB data have often relied 
on the effect of acoustic peak sme aring in the two-point 



function or powe r spectrum (e.g. . IReichardt et al 
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120111* iKeisler et al.ll20lH I.Stor\T7"aT 



2009; 



2012 



120131 ) . but more information can be extracted 



distortion of the CMB through interactions with hot 
gas in galaxy clusters and provides an efficient means of 
finding distant, massive clusters ( e.g., Williamson et al 



2011b iPlanck Collaboration et al 1 120 lit IReichardt" et a 
20ia lHasselfield et al.1 \20lW . The tSZ power spec- 



from CMB l ensing when the four-point function is con - 
sidered (e.g.. IDas etalll2011al: Ivan Engelen et al.ll2012l ). 
The thermal SZ (tSZ) effect arises from the spectral 



trum measures the mean squared signal from all clus- 
ters and is a sensitive probe of the normalization of the 
matter power spectrum erg, with the tSZ power spec- 
tr um amplitude predicted to scale as al, with 7 ~ 7- 
8 ([Komatsu fc Seliak|[200l IReichardt et alJ[20ll . The 
tSZ power spectrum amplitude has been constrained 
through measur ements of the CMB at smal l angu- 
lar scales (e.g., ILueker et al.l 120101 IDas et al.l I2011bt 
IReichardt et al.ll2012D . However, the tSZ power spec- 
trum signature is dominated by high-redshift, low-mass 
clust ers that are n ot well studied at other wavelengths 
(e.g.. lHolderll2002f ). Significant modeling uncertainty for 
this population of clusters complicates the translation of 
the measured tSZ power spectrum am plitude into a con- 
straint on er 8 (e.g.. ILueker et al1l2010T) . 

An alternative approach to constraining erg and other 
parameters that affect the growth of structure is to study 
just those galaxy clusters that can be individually de- 
tected in millimeter-wave maps through their tSZ sig- 
nature. When rcdshifts are obtained for every cluster, 
this approach can constrain both erg and the equation 
of state of dark ene rgy (e.g.. iWang fc Steinhardtl [19981 : 
lHaiman et al.l[200ll ). The scaling of the observable (the 
number of clusters detected) with erg is even steeper 
than for the power sp ectrum, with number counts going 
roughly as a\° (e.g., iDudlevi I2012T) . Constraints based 
on number counts are nearly independent of those using 
the tSZ power spectrum, making the two probes nicely 
complementary. 

The thermal SZ bispectrum offers another approach 
that complements both the power s pectrum and cluster- 
detect ion methods. As shown in [Bhattac harva et al.l 
(|2012t hereafter B12), the tSZ bispectrum is dominated 
by more massive, lower-rcdshift clusters than the tSZ 
power spectrum. This population of clusters is subject to 
less modeling uncertainty than the higher- redshift, lower- 
mass clusters that dominate the tSZ power spectrum. 
Furthermore, the tSZ bispectrum is yet more sensitive 
to erg than the tSZ power spectrum and cluster num- 
ber counts. B12 demonstrated that the amplitude of the 
tSZ bi spectrum scales as B t sz oc a s 1_12 . iHill fc Sherwlr] 
(|2013f ) demonstrated a si milar scaling for the real-space 
tSZ skewness (T t 3 sz ), and I Wilson et al.1 (|2012h used this 
theoretical prediction and a measurement of the tSZ 
skewness in Atacama Cosmology Telescope (ACT) data 
to place a <5% constraint on erg. 

B12 also showed that by simultaneously constrain- 
ing cosmology and cluster physics with the tSZ bi- 
spectrum, one could make a precise prediction for the 
amplitude of the tSZ power spectrum. The tSZ con- 
tribution to the CMB power spectrum is difficult to 
separate fro m the contribution of the kinematic SZ 
kSZ) effect (ISvm^ aev fe Zel'dovic h 1980) in current data 
Reichardt et al.ll2012l ). However, under the assumption 
that the contribution to the bispectrum from the kSZ 
is negligible (see Section B~T1 for details), the bispectrum- 
based prediction for the tSZ power spectrum can be used 
to sharpen the measurement of the kSZ power spectrum. 
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The resulting constraints on the kSZ effect — which are 
intere sting for cosmology and for models of re ionization 
(e.g., Knox et al.l [l998t iGruzinov fc Hul [1991) — arc po- 
tentially much stronger than from the power spectrum 
alone. 

Finally, the no n- Gaussian signals from extragalactic 
emissive sources are also potentially interesting. Two 
populations of sources contribute significantly to mea- 
surements at the frequencies at which CMB experi- 
ments typically operate (roughly a few GHz to hundreds 
of GHz): synchrotron-dominated "radio sources" — 
primarily active galactic nuclei — and dusty star-forming 
galaxies (DSFGs), the integrated light from which pro- 
duces the cosmic infrared background (CIB). Measure- 
ments of the bispectrum contribution from radio sources 
and DSFGs can constrain source count models beyond 
the threshold for individually detecting sources, and with 
different flux weighting than measurements using the 
power spectrum. Perhaps more intriguingly, measure- 
ments of the bispectrum of the clustered CIB (fluctua- 
tions in the mean CIB emission due to large-scale struc- 
ture) have the potential to constrain models of galaxy 
and star formation beyond what can be done with CIB 
power spectrum measurements. 

The bispectra of the secondary anisotropics and fore- 
grounds are characterized by different angular scale de- 
pendence as well as different spectral signatures. Multi- 
band measurements across a wide range of angular scales 
can therefore be used to isolate the different contri- 
butions. The purpose of this work is to present bi- 
spectrum measurements in three frequency bands cen- 
tered at roughly 95, 150, and 220 GHz, (corresponding to 
wavelengths of —3.2, —2.0, and — 1.4 mm), using approx- 
imately 800 deg 2 of sky from the South Pole Telescope SZ 
(SPT-SZ) survey. We concentrate on the range of angu- 
lar scales (or multipole number) at which the secondary 
and foreground sources of non-Gaussianity arc expected 
to dominate, namely 9<W' (Z>1000), and use this infor- 
mation to simultaneously fit for the contributions from 
tSZ and from the clustered and spatially uncorrelated 
contributions from emissive sources. 

The paper is organized as follows: we briefly describe 
the data products used in this analysis in Section [2j we 
present the method for estimating the bispectrum in Sec- 
tion [3] we describe the model used to fit the resulting 
bispectrum measurements in Section [4j wc present mea- 
sured bispectra and model fit results in Section [SJ we dis- 
cuss the implications of these results for cosmology and 
models of source emission in Section |6j and we conclude 
in Section [7J 

2. DATA 

The SPT (|Carlstrom et al.l [201lT) is a 10-meter tele- 
scope located at the National Science Foundation's 
Amundsen-Scott South Pole station in Antarctica. The 
2500-deg 2 SPT-SZ survey was completed in November 
2011. This survey produced maps in three frequency 
bands (95, 150, and 220 GHz) to a depth such that 
the rms 150 GHz noise level in any of the maps is 
< 18/xK per 1-arcminute pixel. Scientific results from 
partial or full SPT-SZ survey data i nclude catalogs of 
clust er s discovered via the SZ e ffect (|Vanderlinde et al.l 
[20101 IWilliamson et~aTl l20rl IReichardt et al.l 120131) . 
catalogs of emissive sources (including a new popu- 



latio n of strongly lens e d, du sty, high-redshift galax- 
ies, iVieira et al.l 120101 |2013[ ). an d measurements of 
the primary CMB power spectrum (|Keisler et al.l [20111 
iStorv et all 120121 ) and of the secondary CMB and fore- 
ground power spectra (ILueker et al.ll2010l : iShirokoff et al.1 
[20111; [Reichardt et all [20121 ). 

IReichardt et al.l (|2012t hereafter R12) used -800 deg 2 
of SPT-SZ survey data to measure the small- angular- 
scale CMB power spectrum and place unprecedentcdly 
tight constraints on the fluctuation power of the tSZ, 
kSZ, and CIB at SPT observing frequencies. In this 
work, we measure the bispectrum over the same area of 
sky as was used in R12 to measure the power spectrum. 

The precision expected for a bispectrum measurement 
using 800 square degrees of SPT data should be high 
enough to provide useful new information. The B12 
modeling uncertainty on A t sz (the amplitude of the tSZ 
power spectrum) given a perfect measurement of -Btsz 
(the amplitude of the tSZ bispectrum) is 7% for reason- 
able assumptions about gas physics and —15% for the 
most extreme assumptions. The predicted scaling be- 
tween the two quantities is -Btsz oc A\§ z (B12). This 
implies that a —20% measurement of £?tsz will be lim- 
ited in its constraint on A t sz by modeling uncertainty 
in the pessimistic case, and a —10% measurement will 
be limited by modeling uncertainty in the default case. 
B12 showed that a survey with the depth and sky cover- 
age of the full 2500-deg 2 SPT-SZ survey should be able 
to make a —6% measurement (or, equivalcntly, a — 16c 
detection) of the tSZ bispectrum in the 150 GHz data 
alone. Thus, an —11% measurement (— 9cr detection) 
should be achievable with 800 deg 2 of SPT-SZ 150 GHz 
data, with the 95 GHz data adding additional detection 
significance. Depending on the level of modeling uncer- 
tainty assumed and the additional uncertainty on -Btsz 
due to systematics (see Section 14.2. ip and sample vari- 
ance (see Section |6.3.1[) . this level of detection has the 
potential to provide interesting A t sz constraints. 

The data analysis up to the point of generating maps 
from single observations of each field is identical to 
that in R12, and we refer the reade r to that work and 
other SPT data ana lysis papers fe.g.. ILueker et al]|2010l : 
iSchaffer et al.ll2011f ) for details of the analysis. Briefly, 
raw, time-ordered detector data from a single observa- 
tion of a given sky field are relatively calibrated, data 
selection cuts are applied, high-pass filters are applied 
to the data to downweight noise from the atmosphere 
and the readout, and the data are binned into a single- 
observation map, using simple inverse-variance weight- 
ing. 

R12 used a cross-spectrum analysis to estimate the 
power spectrum. This choice, involving the cross- 
correlation of single-observation maps, was made mainly 
to avoid noise bias in the power spectrum. In the bi- 
spectrum analysis, no noise bias is expected if the instru- 
ment /atmospheric noise is Gaussian. For the bispectrum 
estimation, we therefore use a single coadded map for 
each field and frequency band, made by taking inverse- 
variance- weighted averages of all single-observation maps 
(the total number of single observations ranges from 
—200 to —1000). We use simulated observations to char- 
acterize the effect of instrument beam and timestream 
filtering and to estimate bispectrum uncertainties (see 



4 



Crawford ct al. 



Section [3] for details). In this work, we use the single- 
observation maps and simulation products from the R12 
analysis. 

The maps used in R12 and in this work are constructed 
from data taken in the 2008 and 2009 SPT observing 
seasons. In 2008, only detectors at 220 GHz and 150 GHz 
produced science-quality data; in 2009, science-quality 
data was produced in all three frequency bands. The two 
2008 fields are ~100 deg 2 and roughly square on the sky; 
the three 2009 fields are ~200 deg 2 and extend roughly 
twice as far (in real degrees on the sky) in right ascension 
as they do in declination. To simplify the bispectrum 
calculation, we split each of the 2009 fields into a left 
and a right half, each of which is roughly the dimensions 
of the 2008 fields, leaving us with eight ^100 deg 2 fields 
of similar shape. The total sky area analyzed, corrected 
for any overlap between fields and for regions near bright 
sources that are interpolated over, is 837 deg 2 . 

3. BISPECTRUM ESTIMATION METHOD 

Previous estimates of non-Gaussianity in CMB data 
(primordial or otherwise) have generally made use 
of an estimator characterizing a single amplitude 
for the non-Gaussian signal. This amplitude pa- 
rameter is /nl for primordial non-Gaussianity (e.g., 
lYadav fc Wanderdl2008t iKomatsu et~aH201lD and (T?^) 
for non-Gaussianity due to tSZ (|Wilson et al.ll2012l ). An 
alternate analysis method is to calculate the three-point 
function or bispectrum in full generality, then extract 
the best-fit amplitude of a given non- Gaussian signal 
template from the full bispectrum. This more general 
approach does not require assumptions about the angu- 
lar scale dependence of the non-Gaussian signal. It also 
allows the freedom to simultaneously measure different 
sources of non-Gaussianity, such that signals that are not 
of interest (for example the bispectrum due to emissive 
sources in an analysis targeting the primordial CMB bi- 
spectrum) can be marginalized over. 

Historically a calculation of the full bispectrum has 
been avoided because it is computationally un feasible 
for full-sky datasets (c.g. lYadav fc WanderdfeoiOt ). How- 
ever, over a small patch of sky, one can take advantage 
of the flat-sky approximation, allowing spherical har- 
monic transforms to be replaced by fast Fourier trans- 
forms (FFTs). In this work, we use the flat-sky ap- 
proximation and calculate the full, three-dimensional bi- 
spectrum over ~800 deg 2 , o r roughly 2% of the full sky. 
iFer guss on fc Shellard (|2009f ) find that the bispectrum es- 
timated using the flat-sky approximation agrees with the 
full, curved sky analysis to < 1% if all I values are greater 
than 150. In this work, we only consider multipole values 
of I > 1000, so the flat-sky approximation is a very good 
one for this analysis. 

3.1. Defining the estimator 

Following iHul (|2000ft . we define the full-sky (angle- 
averaged) bispectrum through the relation 



\Q>l±Tni Q'l2m2 %m3 ) 



h h h 
mi ni2 TO3 



B 



I1I2I3 



where ai m are the coefficients of the spherical harmonic 
expansion of the full-sky temperature field, and the 
Wigner 3-j symbol enforces selection rules on the triplets 



of angular modes. In the flat-sky limit, the equivalent 
relation is 

(a(h)a(h)a(l 3 )) = (2ir) 2 5 2 (h+h + h) B(h,l 2 ,h), (2) 

where a(l) are the coefficients of the Fourier transform 
of the partial-sky temperature field, defined by 



AT(x) 



d 2 l 



a(l)e 



/1-X 



(3) 



and the Wigner 3-j symbol has been replaced by a 
Dirac <5-function enforcing that the locations of the three 
Fourier modes form a triangle in /-space. If the signal 
responsible for the bispectrum has no preferred direction 
in the sky, we can write this as 

(a(h)a(h)a(h)) = (2ir) 2 S 2 (h+h + h) B(h,l 2 ,l 3 ), (4) 

where li = In this limit, the flat-sky bispectrum 

B(li,l2,h) is equivalent to the full-sky reduced bi- 
spectrum bi t i 2 i 31 defined through the relation 



B, 



I1I2I3 



(2ii + l)(2i 2 + l)(2i 3 + l) 

47T 



h h h 




(5) 

(jKomatsu fc Spergel|[200l . 

For a finite-sized map of angular extent A9, Fourier 
modes separated by Al < 2ir/A9 are indistinguishable 
from one another. To avoid significant correlations be- 
tween mode triplets, the bispectrum of such a map should 
be cal culated in bins of size Al > 2ir/ A8. iSantos et al.l 
(|2002| ) define an estimator of the bispectrum in bins of 
width Al: 



B(hj2, h) ■ 



1 



Ni ly i 2 .i 3 -Al 

]T Re[a(l 1 )a(l 2 )a(l 3 )] 



(0) 



U-Al/2<\U\<li+Al/2 

subject to the constraint li + I2 + I3 = 0, where where 
Ke[x] is the real part of the complex number x, and 
A r ; 1 ./ 2i ; 3; A/ is the number of mode triplets that satisfy the 
I bounds and the triangle condition. We add the option 
of assigning different weights to different mode triplets 
in a bin by redefining the estimator to be 



B(h,l2, h) : 



1 



(7) 



W(h,h,h) Re [a(l 1 )a(l 2 )a(l 3 )] 



where the sum is over the same mode triplets as in Equa- 
tion [51 and the same triangle condition applies. The 
weighting scheme used in this analysis is described in 
Section [3.41 To compare the partial-sky bispectrum esti- 
mate to predictions for the full-sky reduced bispectrum, 
we divide by the map area Ar2 map . 

We only calculate the auto-bispectrum 
B{li,Vi;l2,Vi',lz,Vi) for each of our three frequency 
bands. We are not exploiting the full information 
in the bispectrum, since there are also seven unique 
(1) cross-bispectra B(l±, 1^; l 2 , Vj', I3, Vk)i where Vi, i/j, and 
Vk are not all the same. To keep the computation and 
interpretation as simple as possible for this first result, 
we postpone investigation of the cross-bispectra to a 
future publication. 
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3.2. Treatment of the instrument beam and filter 
transfer function 

The maps from which we estimate the bispectrum in 
this work do not contain unbiased estimates of the true 
sky temperature at all angular scales, due to the filtering 
applied to the detector time-ordered data and due to 
the instrument beam. However, in the limit that the 
filtering is a purely linear operation that is uniform over 
the map, we can define a single Fourier-domain function 
F(l) that describes the combined effects of beam and 
filtering on the coefficients a(l). We obtain an unbiased 
estimate of the true a(l) by dividing the raw, biased 2,(1) 
(estimated by directly Fourier transforming the map) by 
F(V). We estimate F(l) by taking realistic mock skies 
(described in detail in Section I3.4.2[) . convolving them 
with the measured beam, and running them through the 
full pipeline up to the coadded map stage. We calculate 
the two-dimensional Fourier-domain ratio of output to 
input maps and use this as our estimate of F (1). 

Due to the finite size of the detector array and the sky 
fields measured, the timestream filtering process cannot 
truly be represented by a purely linear map-domain fil- 
ter that is uniform across the map. However, we ex- 
pect any errors in the measured transfer function due 
to this non-linearity or non-uniformity to be very small 
for two reasons. First, the primary effect of departures 
from our idealization of the transfer function is to alias 
power at low spatial frequenc ies to high spatial frequen- 
cies fe.g.. lSchaffer et al.ll20Tll ). This is a potentially sig- 
nificant problem when the signal spectrum is very red (as 
in measurements of the primary CMB power spectrum), 
but the expected bispectrum signal in the I range treated 
here is much closer to flat in I. Second, the input signal 
we use in the simulations is expected to be a reasonable 
approximation to the true input signal, which minimizes 
the impact of non-linearity. Furthermore, even if the as- 
sumed input signal is significantly wrong, the expected 
errors on the trans fer function are small. For example, 
ILueker et al.l (|2010f) tested the filter transfer function for 
SPT power spectrum analysis and found that changing 
the input power from extragalactic sources by a factor of 
two made < 1% changes in the inferred transfer function. 

3.3. Apodization and compact- source treatment 

We use FFTs to calculate a(l) from our maps, and 
FFT algorithms assume periodic boundary conditions. 
To avoid injecting false signals from discontinuities at 
the edges of the map, we enforce periodic boundary con- 
ditions by apodizing each map after embedding it in a 
larger grid and zero-padding. For a given sky field, we 
create the apodization mask as follows: we start with a 
map representing the total inverse- variance weights used 
in creating the final coadded map for a given field. We 
smooth the weight map with a Gaussian kernel with 
FWHM = 4', divide by a fiducial weight value (equal 
to 80% of the median weights, a value empirically deter- 
mined to produce well-behaved masks), and set all values 
above 1.0 equal to 1.0. The resulting apodization mask 
also downweights the edges of the map, which are noisier 
than the ncarly-uniform-noise main map region. 

The apodization in real space is a convolution in 
Fourier space, with the convolution kernel being the 
Fourier transform of the apodization mask. This oper- 



ation will correlate otherwise uncorrelated Fourier co- 
efficients. However, if the mask has a smooth taper 
and no features on small angular scales, the Fourier- 
domain convolution kernel will be compact and of width 
Al ~ 2n/A9; i.e., the mode correlation induced will be 
approximately the same as the correlation that arises just 
from the finite size of the map. If the I bins used in the 
bispectrum analysis are sufficiently wide, this correlation 
can be ignored. We choose a bin size of Al = 200; the 
amplitude of the 2-d Fourier transform of the apodiza- 
tion mask for a typical field at I = 200 in either dimen- 
sion is <0.01 times the I = value. We have verified 

through simulated observations that no detectable cor- 
relation between bispectrum values in bins of this size 
is induced from our apodization masks, and we ignore 
any effects of the mask — beyond correcting for its effect 
on Af2 map , the area over which modes are measured — in 
subsequent analyses. 

It is common practice in CMB a nalyses to also mask 
compact sources in the maps (e.g., iHinshaw et al.l 120031 : 
ILueker et al.ll201d : iFowler et al.ll2010f r Although we are 
interested in the bispectrum from compact sources such 
as galaxy clusters, radio sources, and DSFGs, we do want 
to remove the signal from the very brightest of these 
sources. Masking the brightest sources reduces sample 
variance — and, in some cases, uncertainty in modeling 
their bispectrum — and it allows us to estimate how much 
of the bispectrum signal is coming from sources that have 
not been detected and characterized in other analyses of 
the same data (see Section 14.1.21 for details). However, if 
we were to multiply the maps by a mask that had holes at 
source locations, we could no longer ignore the effects of 
the mask on the measured bispectrum because the mask 
would now have small-scale features. 

To avoid having to c a lculat e the bispectrum equiva- 
lent of the iHivon et al.l (|2002l ) pseudo-C; mode-mixing 
kernel, we instead choose to remove compact-source sig- 
nals from our maps vi a harmonic inpainting. W e use the 
procedure described in Ivan Engelen et all (|2012l ): briefly, 
a square region around a bright source in the map is 
interpolated over using the correlation properties mea- 
sured in the rest of the map to create the interpolates. 
In all maps, we interpolate over sources detected a t 5a 
at 150 GHz (using the catalogs of lVieira et al.ll2~010l and 
L. Mocanu et al., in preparation). Because the different 
sky fields used in this analysis were observed to slightly 
different depths, the 150 GHz flux cut to which this sig- 
nificance is equivalent varies from 5.7 mJy to 6.6 mJy 
In some versions of the bispectrum analysis, we also in- 
ter polate over galaxy clus ters above a given mass from 
the iReichardt et all (j2013| ) catalog (see Section 14.1.21 for 
details). In all cases, the inpainting is done over only 
~1% of the total map area. We test for any effects of 
this inpainting on bispectrum measurements by study- 
ing simulated data. We see no effect from inpainting — 
beyond the obvious effect of eliminating the contribution 
to the bispectrum from the painted-over sources — at the 
sensitivity of our tests, which probe down to roughly 1% 
of the expected secondary /foreground bispectrum signal 
level. 
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The signals we are interested in for this work are non- 
Gaussian contributions to the sky temperatures recorded 
in our maps. Any purely Gaussian contributions will, 
by definition, produce no average bispectrum. However, 
Gaussian components of the maps will contribute to the 
variance on the bispectrum measurement. Therefore, 
in addition to instrumental and atmospheric sources of 
noise, astrophysical and cosmological sources of Gaus- 
sian power (such as the primary CMB and populations 
of emissive sources) will also contribute noise to this anal- 
ysis. 

When constructing the binned bispectrum, we average 
together the products of many mode triplets to estimate 
the bispectrum in each bin. Considering a single mode 
triplet , the variance on the product of three Fourier coef- 
ficients 0(11)0(12)0(13) has contributions from the Gaus- 
sian components of the map (including the Gaussian part 
of intrinsically non-Gaussian sources of power such as 
emissive sources) as well as from the non-Gaussian com- 
ponents. In the limit of very small non-Gaussian signa- 
tures in the maps, the Gaussian components dominate 
this variance. 

The three most significant sources of fluctuation power 
for the maps used in this analysis are noise (instrumen- 
tal and atmospheric), primary CMB fluctuations, and 
power from extragalactic sources (mainly DSFGs) be- 
low the SPT detection threshold. We measure the SPT 
noise to be Gaussian at the level necessary for this anal- 
ysis by calculating the bispectrum of noise-only maps 
(described in more detail in Section 13.4.11 below) using 
the bispectrum estimator and obtaining the expected 
null result. The non-Gaussianity of the primary CMB 
is constrained through estimates of /nl from WMAP 
data. The signal from extragalactic sources is intrin- 
sically Poisson-distributed, but at flux levels at which 
we expect many sources per SPT beam, the distribution 
of fluxes in a single SPT beam or pixel will approach 
a Gaussian. Hence the extragalactic source signal will 
have a non-Gaussian part, which we consider as a poten- 
tial bispectrum signal, and a Gaussian part, which will 
contribute to the noise of the bispectrum measurement. 
The processes used to estimate weights and error bars 
are described in more detail in Sections 13.4.11 and 13.4.21 

When averaging many mode triplets together, we use 
weights derived from estimates of the Gaussian variance 
from noise, primary CMB, and point sources (estimated 
as described in the next section). Error bars on the 
binned bispectrum values are also constructed from es- 
timates of the Gaussian variance in the maps. These 
weights and error bars do not take into account non- 
Gaussian signal variance. Signal variance for the binned 
bispectrum can be significant: individual strong sources 
in the maps produce bispectrum signals that are highly 
correlated across mode triplets and will vary from field to 
field across the sky. We take this signal variance into ac- 
count when interpreting our model fits in a cosmological 
context, as described in Section IB. 3. II 

3.4.1. Weights 

In this work, the weights used in the bispectrum es- 
timator described by Equation [7] are constructed from 
estimates of the Gaussian variance for each mode triplet. 
A purely Gaussian component with angular power spec- 
trum C(l) will contribute variance to the bispectrum 



measurement equal to 

1 2 , 1 3 )| 2 ) = C(li)C(l 2 )C(l 3 ) (8) 
for li 7^ I2 7^ I3, so we use as our bispectrum weights 

Our estimate of the total C(l) contributing to bispectrum 
variance is the sum of the contributions from primary 
CMB fluctuations, emissive sources below the SPT de- 
tection threshold, and instrumental/atmospheric noise. 

The input CMB and point-source spectra are identi- 
cal to those used in the simulated skies in R12, and we 
refer to that work for details on the input spectrum. 
Briefly, the contribution from the primary CMB is a 
ACDM mode l of the primary CMB power spectrum from 
iKeisler et al.l (|2011|) . and the contribut ion from emis- 
sive so urces is based on measurements in [Shiroko ff et al.l 
(|2011| ). The contribution from instrumental and atmo- 
spheric noise is estimated using the two-dimensional l- 
space noise power spectrum associated with the map. 
The noise power spectrum for a given map is calculated 
via a jackknifc procedure, in which many combinations 
of the individual-observation maps are created, each one 
with half the maps multiplied by —1 so that the result- 
ing combination has no astronomical signal. The power 
spectrum of each of these combinations is computed, and 
the results are averaged to produce the final estimate of 
the noise power spectrum C no i se (l), which is divided by 
the square of the beam and transfer function estimate 
F(l) (see Section 13. 2|) before being included in the vari- 
ance calculation. 

The expected variance of individual Fourier modes 
due to instrumental/atmospheric noise and the primary 
CMB varies across the Fourier plane. The primary CMB 
variance depends on I, and the noise power spectrum 
depends on 1 (see iSchaffer et aTll2011l for details). Our 
weights take these variations into account. 

We address some mode triplets as special cases. For 
mode triplets in which two of the 1 values are the same, 
the bispectrum variance will be elevated by a factor of 
three (because (a(li)a(h)a(l 3 )) -> (a 2 (li)o(l 3 ))). Addi- 
tionally, because the estimator we are using takes the 
real part of 0(11)0(12)0(13), the same noise elevation oc- 
curs for mode triplets in which li = I2 or li = — 13, 
etc. Because the fraction of such mode triplets is small 
(fewer than 0.01% of the total number of triplets over 
the I range considered here), and they would be signifi- 
cantly downweighted in our weighting scheme, we choose 
to simply give these triplets zero weight in the estimator. 

3.4.2. Binned Bispectrum Error Bars 

The total inverse-variance weight in a given I bin is 
also directly related to the uncertainty on the estimate 
of the bispectrum in that bin: a 2 (B(li, I2, 13)) should be 
equal to l/Wtot, where Wtot is the total weight in that 
bin 

w tot = W(h,h,h). (10) 

ii-A;/2<|ii|<ii+A;/2 

In practice, we estimate the bispectrum variance 
a 2 (B(li, I2, h)) from the scatter in the bispectrum mea- 
sured from 100 simulated observations. As a cross-check, 
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we have compared the binned bispectrum error bars es- 
timated from the weights to those estimated from the 
simulations. The two are the same up to an overall scal- 
ing factor (of order unity) related to the ratio of the 
area under the apodization mask to the total area of 
the field, which affects the number of truly independent 
mode triplets in a bispectrum bin. This decrease in in- 
dependent modes is due to mode correlation from the 
mask. This correlation also increases the number of mode 
triplets with elevated noise (as described in the previous 
section). However, for masks that are smoothly tapered 
and cover nearly the full field (such that they are strongly 
localized in the Fourier domain), the fractional increase 
in noisy triplets — which were < 0.01% of total triplets to 
begin with (see Section 13.4. 1[) — is small. Therefore, we 
ignore this effect in this analysis (i.e., we include these 
triplets in the estimator and give them the weight they 
would have in the absence of masking). 

We create our final estimate of bispectrum error bars 
by running 100 sets of mock observations of our eight 
fields through the bispectrum estimator and calculat- 
ing the scatter in the measured bispectrum in each I 
bin across the 100 sets. The input skies are composed 
of Gaussian realizations of the same sky power spec- 
trum used for the weights described in the previous sec- 
tion, convolved with the measured beam-and-filter trans- 
fer function -F(l), with a realization of the instrumen- 
tal/atmospheric noise added. For the simulated noise in 
a given field, we use one of the signal-free combinations of 
individual observations used in the noise power spectrum 
estimation described above. The simulated observations 
are apodized in the same manner as the real data, so any 
effects of the apodization are taken into account in the 
uncertainty estimation. The simulations do not contain 
correlated signal between fields, so overlap between fields 
is not taken into account; however, the overlap is ~2% 
of the total area, and any error caused by neglecting it 
is small compared to the statistical precision of our final 
results. 

We have examined the l-bin-to-Z-bin covariance over 
the 100 simulated observations and see no bin-to-bin cor- 
relation above the level expected from this number of in- 
dependent measurements. We treat the noise in each 
I bin as independent in all subsequent analyses. We 
do see correlations within an I bin among the three ob- 
serving bands, as expected given the contribution to the 
bispectrum variance from Gaussian sky signal (partic- 
ularly the primary CMB, which is perfectly correlated 
among the three bands). We account for these correla- 
tions by expressing the bispectrum covariance as a 3 x 3 
matrix in each I bin. We also note that we detect no 
mean bispectrum in these simulated observations, which 
include actual instrumental/atmospheric noise, demon- 
strating that the noise is Gaussian to a very good ap- 
proximation. The bispectrum covariance matrix used in 
further analysis is thus 



Cij{l\, h, h) — ( BgimQi, I2, 13, Vi) B s i m (li, I2, 13, Vj) ) , 



(11) 

where i? s im is the estimated bispectrum from a single 
simulation, and the expectation value is over all simula- 
tions. 



4. BISPECTRUM MODELING AND MODEL FITTING 

To interpret any detection of the secondary / foreground 
bispectrum in an astrophysical or cosmological context, 
we need a model of the expected signal and a model- 
fitting procedure. In this section, we describe the signal 
models we adopt and the procedure we use for fitting 
the multi-band data to these models. We also describe 
how we account for instrument-related systematic effects 
such as uncertainties in beams, spectral bandpasses, and 
calibration. 

4.1. Signal models 

We include three types of non-Gaussian signal in our 
modeling: tSZ from galaxy clusters, the spatially un- 
correlated signal from extragalactic sources (hereafter 
"point sources," since the vast majority of such sources 
will appear point-like at the ~1' resolution of the SPT), 
and the expected clustered emission from one source pop- 
ulation (DSFGs). We describe our modeling choices for 
each of these in turn, but first we note that we do not in- 
clude other potential sources of millimeter-wave signal — 
particularly the kSZ effect, clustered radio sources, and 
galactic foregrounds — in the bispectrum model. 

We do not include the kSZ primarily because 
none of the predicted kSZ-gencrating mechanisms — 
including the peculiar velocity o f free electrons in 
galaxies (IQstriker fc Vishniad 11986^ or galaxy clusters 
( Sunvaev h Zel'dovichl Il980fl. and pat chy reionization 
( Knox et al.lll998l : lGruzinov k, HvJll998t ) — is expected to 
impart a net skewness on the CMB temperature distribu- 
tion. This is because the velocities of ionized gas in the 
universe should be random with respect to the observer, 
meaning that the induced kSZ signal should be symmet- 
ric around the mean CMB temperature. We expect the 
signal from the clustered radio background to be much 
smaller than the clustered DSFG signal (sec Section l4.1.3l 
for details), and we do not include such a term in our sig- 
nal model. We do not include any expected signal from 
our own galaxy, both because the sky fields used here 
are at high galactic latitude, and bec ause such signals 
are e xpected to fall steeply with I fe.g.. lFinkbeiner et al.l 
119991) and thus be negligible at the angular scales or I 
values of interest to this work. 

4.1.1. Spatially uncorrelated ("Poisson") point-source 
contribution 

We introduce the model for spatially uncorrelated 
point sources first to illustrate the basic properties of 
the bispectrum arising from any population of discrete, 
spatially uncorrelat ed sources with a given angular pro- 
file. Following, e.g.. lHall etail (poTol) . we will use "Pois- 
son" as shorthand for the component of the point-source 
contribution to the power spectrum or bispectrum that 
arises from spatially uncorrelated sources. 

For a CMB map made at observing frequency v with 
pixels of size AQ p (where, for this toy example, the pixel 
size is much larger than the beam size), containing only 
a point source of flux S, we can write the signal in the 
map as 



,(x) 



Tpcak, if x S source pixel 



0. 



otherwise 



(12) 



We know that the total flux in the map must equal S, 
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which means 



T pea ,k(v) = g(x v ) X 



(13) 



where g(x v ) is the conversion factor between CMB fluc- 
tuation temperature and flux per solid angle (in units of 
Jy sr _1 ) at observing frequency v: 



2fc_e / k B T CMB 
c 2 I h 



xte Xv 



( e x„ _ 1)2 



(14) 

and x v = hv / {ksTcMB,)- For angular frequencies well 
below the cutoff of the pixel window function (I <C 
2tt/ Af2 p ), the Fourier transform of this map is 

Osourcc(l) = J d x T' sourcc (x) e (15) 

— A£lp Tpcak ^ 

= S (i„)Se- fl,x ™». 

The estimated bispcctrum due to this single source of 
flux S is then 

B(h,l 2 ,h, v) = -^J2 Rc Hh)a(h)a(-(h + 1 2 ))] 



iVj 

t 

g 3 (x.) S 3 
Ni 



-ill 'Xg, 



xe 



i(li+l 2 )-x a 



= «? 3 (^) S 3 , 



(16) 



where we have used the triangle condition li + 1 2 + 13 = 
to redefine I3. When there are two or more sources in 
the map, Equation 1161 becomes 

B{h,h,h^)=g\x v ) (Sl + Sl + ... + S% (17) 

+ cross terms) , 

where the cross terms are of the form 
S ,2 S , 2 e" i;(ll+l2) (xi " X2) . If the sources are spatially 
uncorrelated, the phase of the cross terms is random, 
and these terms will on average be zero, leaving 

B(h,l 2 ,l 3 , v) = g\x v ) (S\ + Si + ... + S 3 N ) (18) 

as the only nonzero average bispectrum contribution. 

For a population of sources with number density per 
unit solid angle per unit flux dN/dS/dtt, Equation [TS] is 
easily generalized to 

C°° dN 

B(h,l 2 ,l 3 ,v)=g 3 (x v ) Afi map y s3 jsdn dS > ( 19 ) 
or, if sources have been cleaned down to some threshold 

Q 



B(h,l 2 ,l 3 ,v) = g z {x v ) 



which, after we apply the A f2 maD correction, is identica l 
to the familiar result of, e.g.. lKomatsu fc Spergell (|2001l) . 

We have chosen to only use bispectrum shape informa- 
tion to fit for the Poisson contribution to our multi-band 



bispectrum (i.e., we fit for a contribution that is flat in 
I). We include a single free parameter in the fit for the 
amplitude of the Poisson source bispectrum in each ob- 
serving band. 

4.1.2. Thermal SZ model 

We use the model described in B12 for the bispectrum 
due to the tSZ effect in galaxy clusters. We describe 
the model briefly here and refer the reader to B12 
for details. The tSZ bispectrum is calculated assum- 
ing the signal arises from spatially uncorrelated galaxy 
clusters and so is conceptually identical to the result 
from the previous section, with the modification that 
the intrinsic angular profile of the clusters modifies 
the bispectrum shape. For a family of astrophysical 
sources with angular profile F(pc) or Fourier-domain pro- 
file F(l), a SOU rce(l) -> F(l) a source (l), and the bispectrum 

B{hM,h) -> F{h)F{l 2 )F{h)B{h,l 2 M)- 

The tSZ bispectrum at multipole numbers l\, l 2 , and 
I3 and observing frequency v is calculated as the in- 
tegral over cosmological volume of the product of the 
Fourier-domain cluster pressure profile at the three I val- 
ues, weighted by the halo mass function: 

B ts ,(hj 2 ^)-n^)Jd Z d ^JdinM d -^ 

x y(M,z,h)y(M,z,l 2 )y{M,z,l 3 ), (21) 

where f(xv) is the dimcnsionlcss function specifying 
the dependence of the tS Z on observing frequency 
(|Sunvaev k, Zerdovichlll98"0() . We do not include rela- 
tivistic corrections to f(x v ) (see discussion below). The 
Fourier-domain pressure profile y(M , z, I) i s calculated 
from the analytic model of lShaw et al.l (|2010| ). using their 
fiducial values of the intracluster medium (ICM) param- 
eters. The halo ma ss function dn(M, z)/d\n M is from 
iTinker et~aT1 (|2008jl . A ACDM cosmology is assumed in 
calculating the halo mass function, with fiducial parame- 
ters as in B12, namely tt b = 0.045, Q ro = 0.27, h = 0.71, 
n s = 0.97, and cr 8 = 0.8. 

The model is calculated at the center of each I bin in 
which the bispectrum is estimated from the data. The 
signal is sufficiently flat in I that this is within 2% of the 
value that would be obtained by calculating the model 
at higher resolution in I and averaging over the bin. The 
tSZ frequency factor f(x v ) is evaluated for each observ- 
ing band at the effective center frequency of the band 
assuming a non-relativistic tSZ spectrum. R12 calcu- 
lated these frequencies to be 97.6, 152.9, and 218.1 GHz. 
(This value is an average over the 2008 and 2009 observ- 
ing seasons for the 150 and 220 GHz bands, see R12 for 
details.) 

There is some uncertainty in how well mass func- 
tion fits to simulation output capture the high-mass 
end, with potent ial 5-10% uncertainties at halo masses 
above ~1O 15 M ([Tinker et al.ll2008t iBhattacharva et al.l 
120111 ). For this reason, we also calculate a version of 
the tSZ model with the mass function truncated above 
M 20 o(Pcrit) = 8 x 10 14 M o //i, where M 20 o(Pcrit) is the 
mass enclosed inside i? 2 oo(Pcrit), defined as the radius 
within which the average density is 200 times the criti- 
cal density. To compare to this prediction, we construct 
a bispectrum estimate in all three SPT bands with clus- 
ters above this same mass removed from the maps (using 
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the same inpainting procedure used for the point sources, 
see Section 13.31) . The cluster masses are taken from the 
iReichardt et al.l (120131) catalog and conve rted from M 50 o 
to M 2 nn assuming a iNavarro et al.1 (|1996| ) profile and the 
iDuffv et al.1 (|2008f ) mass-concentration relation. A total 
of four clusters above this threshold are masked in the 
full -800 deg 2 dataset. 

This level of cluster masking also reduces potential 
systematic errors caused by ignoring relativistic correc- 
tions to the predicted tSZ bispectrum amplitude. The 
most massive, hottest clusters h ave gas temperatures of 
>10 keV (e.g.. lAllen et al.1120081) . At these temperatures, 
the relativistic correction t o the tSZ temperatu re decre- 
ment is -6% at 150 GHz (|Nozawa et al.ll2000f ). Limit- 
ing the cluster sample to M2oo(/0 C rit) < 8 x 10 14 M Q //i is 
roughly equivalent to a temperature limit of T < 5 keV 
(jStanek et al.l l2010h . At these temperatures, the maxi- 
mum error in f(x v ) from ignoring relativistic corrections 
is -3%. 

We also construct tSZ models and bispectrum esti- 
mates with the mass function truncated and clusters 
masked above M 2 oo(p C rit) = 3 x 10 14 7\/ Q //i. This mass 
threshold closely approximates the selection of c luster s 
used to constrain cosmology in rRcichar dt et all (|2013j ). 
namely signal-to-noise ratio greater than five and z > 
0.3. This allows us to estimate the amount of information 
we are extracting from the tSZ bispectrum above and 
beyond what has already been extracted using cluster 
counts. A total of 117 clusters above this threshold are 
masked in the full —800 deg 2 dataset. For comparison, 
the to tal number of clusters used in the IReichard t et al.l 
(|2013| ) cosmological results was 100. 

Masking clusters in the data will lead to a smaller ab- 
solute amplitude of the tSZ bispectrum but will also lead 
to smaller sample variance, since the sample variance is 
dominated by the presence or absence in a map of the 
rarest, most massive clusters. This implies that there 
will be some optimal mass cut that reduces the com- 
bined noise-plus-samplc- variance unc ertainty on the tSZ 
bispectrum, similar to the results in iShaw et all (|2009l ) 
for the tS Z pow er spectrum. We investigate this further 
in Scction l6.3.1l 

4.1.3. Clustered CIB model 

Not only will emissive sources contribute to the Poisson 
bispectrum, they can also be spatially clustered, possibly 
leading to a detectable bispectrum signal with a different 
shape. Because this signal arises from a spatial modula- 
tion of the mean intensity, and because the CIB is much 
brighter than th e radio background a t SPT observing 
frequencies (e.g. lHauser fe Dwekl I2001D . we expect the 
clustering signal from DSFGs to be much stronger than 
that from radio sources, a s has been found in power spec - 
trum measurements fe.g.. lHall etl!][2TnTil : lHoldeH[2T)rl . 
We do not include the clustered radio background in our 
modeling and concentrate on the potentially measurable 
signal from the clu stered CIB. 

As pointed out in lLaca sa (2012), a single population of 
sources with clustering properties described by a single 
correlation function or angular power spectrum will have 
a bispectrum equal to 

B to t{hM,h) = a^C tot (h)C tot {h)C tot {h), (22) 



where a is a constant, and Ctot is the total Poisson- 
plus-clu stering a ngula r power spectrum: Ctot = Cciust + 
Cpoiss- lLacasal (|2012t) further showed that this formu- 
lation provides an accurate characterizatio n of the CIB 
bispec trum in the simulated sky maps of iSehgal et al.1 
(120101) . 

In this formulation, for / triplet bins in which the clus- 
tering signal dominates over the Poisson contribution, we 
can write the clustering signal as 

Bciustihihih) \J Cciust (h ) Cciust {h ) C c i U st ('3 ) ■ (23) 

We use this as the /-space template for the clustered CIB 
bispectrum in our model fits, with a simple power-law 
mod el for the clustered po wer spectrum C c \ us t(l) oc l~ n . 
Both lAddison et al.l (|2012ft and R12 have found this to be 
a good description of the clustered CIB power spectrum 
over a large range of angular scales, including the entire 
range considered in this work. We choose n = 1.2, which 
is con sistent with the best-fit values in lAddison et al.l 
(l20ll and R12. 

The spectral behavior of the clustered CIB at mil- 
limeter wavelengths is fairly well constrained from power 
spectrum measurements, and we use existing results to 
inform our model fitting. We assume a single spectral 
index a c over our three observing bands and model the 
clustered CIB bispectrum at observing frequency v as 

(v \ 3ctc 
— I (24) 

w 

(_k h h_\ n/2 

X V2000 2000 2000 J 

where B 2 ™°{is ) = B chlst (h = 2000, h = 2000, l 3 = 
2000, v Q ). Wc use v = 220 GHz in our CIB model. 

Using the results of R12, we adopt a nominal value 
and la uncertainty for the clustered CIB spectral index 
of a c = 3.72 ± 0.12 (see Section [4.2.11 for how this un- 
certainty is included in the fit). In Equation [24] v for 
each observing band is the effective center frequency of 
the band assuming a v a " spectrum. R12 calculated these 
frequencies to be 97.9, 153.8, and 219.6 GHz for a c = 3.5; 
the difference in effective frequencies for a c = 3.5 and 
a c = 3.7 is negligible. 

We have chosen not to explore more complicated CIB 
modeling, involving (for instance) spatial correlations be- 
tween the sources of the tSZ and CIB bispectra, different 
CIB bispectrum shapes, and spectral behavior beyond a 
single spectral index. As will be shown in Scction l6~2| the 
simple model adopted here is adequate to describe the 
data, and extensions to this model would not be strongly 
constrained using the data in this work. 

For the particular case of tSZ-CIB correlation, we note 
that this effect should be a far less significant bias to the 
measurement of the tSZ bispectrum than it is to the tSZ 
power spectrum. Galaxies in the high-mass, low-redshift 
clusters that make up the bulk of the tSZ bispectrum 
signal are measured to have significantly less star for- 
mation per unit mass than galaxies in lower-mass clus- 
ters o r low-redshift field galaxies (e.g.. iHashimoto et al.l 
Il998f ). Furthermore, the star- forming fraction is also seen 
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to increase with redshift (e.g.. iButcher fc Oemlerlll984D . 
This evidence all indicates that, even if tSZ-CIB correla- 
tion has a significant effect on the tSZ power spectrum — 
which is sourced by lower-mass, higher-redshift clusters 
than the bispectrum — the tSZ bispectrum is unlikely to 
be signifcantly affected. When R12 allow tSZ-CIB cor- 
relation as a free parameter, the best-fit tSZ power spec- 
trum amplitude shifts by < 20%, so we assume that the 
effect of tSZ-CIB correlation on our measurement of the 
tSZ bispectrum will be <?C 20% and hence subdominant 
to other uncertainties. 

4.2. Fitting procedure 

We use a simp le linear least-squares procedure (e.g., 
iPress et all 119860 to fit the measured bispectrum with 
the three-component model described above. Least- 
squares fitting results in the maximum-likelihood esti- 
mate of model parameters only if the measurement un- 
certainties are Gaussian-distributed. While the distri- 
bution of the individual /-space mode triplets is highly 
non-Gaussian, each Al = 200 bin contains > 10 4 of 
these triplets, so we expect the distribution of binned bi- 
spectrum uncertainties to be very nearly Gaussian. We 
confirm this through simulations. 

We fit all three bands' bispectrum data simultane- 
ously. The data vector has 3 x A^in elements, where 
Nbin = ('max/A?) 3 , / max is the maximum angular fre- 
quency used in the fit and Al is the size of the bins in 
/-space, in this case 200. None of the signal models de- 
scribed in the previous section have features on the scale 
of Al = 200, so this resolution should be adequate to 
characterize the measured bispectrum. The maximum I 
used in this analysis is 11,000, which was chosen by in- 
vestigating the factor by which the bispectrum variance 
is inflated by deconvolving the beam and transfer func- 
tion F(l) from the maps. The raw SPT map noise at 
high I has a nearly white spectrum. After deconvolving 
F(l), the noise power spectrum of the maps will be pro- 
portional to 1/F 2 (l) at high I. The bispectrum variance 
from this map will thus be proportional to 1/F 6 (1) at 
high I. This factor F 6 (l) is > 500 for all SPT bands at 
/ > 11,000. 

We write the data vector as where the index /i is the 
product of an /-bin index a and an observing-frequency 
index i (such that \x takes on a unique value for each bin 
l a , 1/3, 1'y and observing frequency i/j): 

dn = d[i a ] = d[ ict( g 7 ] = B(l a ,lp,l^,Vi), (25) 

a = aN{; in + [3N hi n + J, 
/i = 3 x a + i. 

The weight matrix, which is the inverse of the bin-bin- 
band-band bispectrum covariance matrix, is assumed to 
be block-diagonal in this analysis — i.e., we assume no bi- 
spectrum covariance between / bins due to noise or Gaus- 
sian sky components, but we do include the band-band 
covariance of the Gaussian sky terms (see Section l3~4|) . 
Under this assumption, each sub-matrix characterizing 
the covariance between bands for a given bispectrum bin 
is an independent 3x3 matrix given by the inverse of the 
band-band covariance matrix for that bin. This covari- 
ance matrix is estimated from simulations, as described 
in Section 13.41 Thus, we can write the weight and co- 



variance matrices as 

WV = C# (26) 

C^v = C[ia] \jb] = C[ia][ja]<5ob) 

where C\ ia ]\jg] = C[ ia ^y a ^] = Cij(l a ,lp,l 7 ), as defined 
in Equation llll and, again, the indices i and j run over 
observing bands and the remaining indices over /-space 
bins. 

The model or design matrix A is composed of five 
3 x A^bin-element vectors, each representing the unnor- 
malized signal shape for one of the signal components in 
all observing bands. The tSZ and clustered CIB vectors 
have non-zero values in all observing bands (although the 
model amplitude for the tSZ in the 220 GHz band is very 
small, since that band is very near the tSZ null), while 
the three vectors representing the Poisson point-source 
power in each band (assumed to be independent in this 
fit) are non-zero only in the iVbi n elements corresponding 
to that band. 

The five free parameters of the model A are the am- 
plitudes for each model component: tSZ, clustered CIB, 
and the Poisson point-source component in each of three 
bands. The best-fit values of these parameters are esti- 
mated from the data as 

-V = {A^W^A^y 1 A^W np d p , (27) 

where sums over repeated indices are assumed. This es- 
timate of parameters has a covariance matrix equal to 

C™ = SX U ) = (Al^A^y 1 . (28) 

4.2.1. Incorporating systematic uncertainties 

The disadvantage of a simple linear least-squares fit 
(in comparison to a more general parameter-space search 
such as a Markov chain Monte Carlo) is that there is no 
way to trivially incorporate systematic uncertainties in 
such quantities as the instrument beam measurement or 
the spectral index of clustered CIB fluctuations without 
introducing strong covariance among all / bins and sig- 
nificantly complicating the inversion of the covariance 
matrix. To retain the advantages of the linear fit (speed, 
simplicity, and robust parameter covariance estimation), 
we account for such systematics by running the linear 
fit many times, each time using a different realization of 
each systematic effect. We then calculate a systematic 
parameter covariance matrix by directly comput- 

ing the outer product SX^ <5A W in each realization and 
averaging over all realizations. 

We account for four independent sources of systematic 
uncertainty: 1) instrument spectral bandpasses; 2) the 
spectral index of CIB fluctuations a c ; 3) instrument cal- 
ibration; 4) instrument beams. B ased on measurements 
described in iSchaffer et alj (|2011[ ) and similar measure- 
ments in 2009, the band centers for SPT are estimated 
to be accurate to 0.3 GHz. The major source of this 
uncertainty is the frequency calibration of the Fourier 
transform spectrometer used to measure the bandpasses, 
implying that the uncertainty should be highly corre- 
lated between the three bands. For each systematic re- 
alization, we draw a bandpass error from a Gaussian of 
width o~band = 0.3 GHz and calculate the signal models 
using band centers shifted by this error. To account for 
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TABLE 1 
Systematic error accounting 



Parameter 


nominal value 


lcr uncertainty 


tSZ band center, 95 GHz 


97.6 GHz 


0.3 GHz 


tSZ band center, 150 GHz 


152.8 GHz 


0.3 GHz 


tSZ band center, 220 GHz 


219.1 GHz 


0.3 GHz 


CIB band center, 95 GHz 


97.9 GHz 


0.3 GHz 


CIB band center, 150 GHz 


153.8 GHz 


0.3 GHz 


CIB band center, 220 GHz 


219.6 GHz 


0.3 GHz 


CIB spectral index 


3.72 


0.12 


calibration, 95 GHz 


1.00 


0.018 


calibration, 150 GHz 


1.00 


0.016 


calibration, 220 GHz 


1.00 


0.024 



Distributions from which the syst ematic error realizations de- 
scribed in Sections 14.2.11 and 15.21 are applied. The band cen- 
ter and calibration uncertainties are highly correlated between 
bands. Uncertainty in the instrument beam in each frequency 
band is also taken into account using realizations of "error 
beams" as described in the text. 

uncertainty in the spectral behavior of the CIB, in each 
systematic realization, we draw a value for a c from the 
R12 distribution a c = 3.72 ± 0.12 and use that value in 
calculating the clustered CIB model. 

R12 estimated the calibration uncertainty in the three 
bands to be 0.035, 0.032, and 0.048 in power, or 0.018, 
0.016, and 0.024 in temperature. These uncertainties 
are highly correlated, because a primary source of un- 
certainty in each band's calibration is the noise in the 
WMAP power spectrum in the range I <G [650, 1000]. We 
approximate the calibration covariance matrix by assign- 
ing the fractional uncertainty at 150 GHz to all bands as 
a fully correlated component and augmenting that with 
uncorrelated components at 95 and 220 GHz to make the 
on-diagonal elements equal to the square of the measured 
uncertainties in each band. For each systematic realiza- 
tion we create a three-element vector o~ ca \(v) with the 
appropriate covariance and multiply the elements of the 
data vector d corresponding to band v by [1 + er ca i(z/)] 3 . 
The mean and lcr width of the systematic distributions 
for bandpass, CIB spectral index, and calibration errors 
are summarized in Table [TJ 

Uncertainties in the measurement of the instrument 
beam are incorporated by creating realizations of the 
beams using the full beam covariance matrix described in 
iKeisler et al.1 (|2011| ). For each systematic realization, a 
beam realization is created for each observing band and 
observing season, including the correlations in the un- 
certainties between bands and seasons. The bispectrum 
estimate from each 100 deg 2 held and each band is multi- 
plied by the cube of the ratio of the appropriate beam re- 
alization (for the year the field was observed) to the nom- 
inal beam. The data from all fields arc then combined us- 
ing the nominal weights, and this combined beam-error- 
multiplied bispectrum is used to construct the data vec- 
tor d. 

5. RESULTS 

5.1. Measured bispectra and single-band detection 
significance 

The bispectrum in each frequency band (with no clus- 
ter masking) , as estimated using the analysis procedures 
detailed in Section [3] and the bispectrum estimator dis- 
cussed in Section 13.11 is plotted in Figure [1] Values of 
the bispectrum and inverse- variance weight in each band 



and each l\, l 2 , h bin are available for download from the 
SPT website. 24 

We note that displaying this inherently three- 
dimensional data product in one dimension requires the 
data to be contracted along two axes. There is no "indus- 
try standard" for displaying bispectra, particularly real 
measurements with noise. B12 used the "skewness spec- 



trum" A(l) = yX^j i 2 ^(hh^h)', however, this quantity 

will have a positive expectation value for a bispectrum 
estimated from data with zero non-Gaussianity but finite 
noise and Gaussian sky power. We choose to define an 
/-space radius 



^rad 



l\ + l 2 + ^3 



(29) 



and to plot 



#(«rad) = ^ W(l 1 1 \ ' (30) 

where W(l\, I2, h) are the bispectrum weights in an I 
bin defined in Section 13.41 The error bar on this one- 
dimensional quantity is 



^(i^rad)) 



1 



1-2 A3 £'rad 



W(h,l 2 ,h)' 



(31) 



We emphasize that this contraction to one dimension is 
only for display purposes; all model fitting and x 2 esti- 
mation is performed in the full three-dimensional I space. 

Three features of the bispectrum data arc immediately 
clear from Figure [U 1) the data are highly inconsistent 
with zero bispectrum in all bands; 2) all bands show ev- 
idence of two signal components, namely a component 
that is larger at low l Ta ^ than high Z ra( j and is roughly 
consistent with a power law in / ra( j, and a flat-in-/ ra d 
component consistent with a Poisson point-source com- 
ponent; 3) the power-law component is negative at 95 
and 150 GHz but positive at 220 GHz, as would be ex- 
pected from a bispectrum dominated by tSZ at 95 and 
150 GHz and by clustered CIB at 220 GHz. 

The results of fitting this data using the multi- 
frequency model from Section [4] arc discussed below. 
However, to make a reasonably model-independent state- 
ment about the preference for these two components in 
the data, we first fit each band's data individually to a 
toy model that includes a flat component and a simple 
power law in I, with a power-law index chosen to match 
the observed signal in all bands. (This turns out to be 
roughly B(h,l 2 ,h) oc (hhh) 2/3 or B(l) cx I 2 along the 
diagonal.) Both components are detected strongly in all 
three bands, with the significance of the power-law com- 
ponent ranging from 5cr at 220 GHz to 9er at 150 GHz. 

To assess whether the data still prefer a power-law 
bispectrum component with the most significantly de- 
tected clusters masked, we estimate the bispectrum in 
each band while masking all clusters with M2oo(Pcrit) > 
3 x 1O 14 M //i, w hich is very close to a c ut at signal-to- 
noise of five in the rRcichard t et al.l (|2013t ) catalog. With 
this level of masking, the detection significance of the 

24 http://pole.uchicago.edu/public/data/crawfordl3 
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Fig. 1. — Measured bispcctrum (with no clusters masked) in each of the three SPT frequency bands. Bispectra have been collapsed 
from three dimensions to one dimension as described in the text. The solid black line shows the best-fit model estimated from the full 
3d bispcctrum (collapsed to Id using the same procedure and weighting as used for the data). The three individual components of the 
best-fit model are also plotted: tSZ (short-dashed red line), clustered CIB (dot-dashed purple line), and the Poisson point-source component 
(long-dashed green line). See Section [3] for more details on the model. The bispectrum error bars shown are statistical only. The data and 
best-fit models shown are for the nominal values of the systematic parameters and with no cluster masking (see text for details). 



power-law component at 95 and 150 GHz data is much 
reduced but still > 2a at 95 GHz and ~ 1.5er at 150 GHz. 

Perhaps most intriguing is the detection of a power-law 
component in the 220 GHz data, which is near the tSZ 
null and should not be measuring a tSZ bispcctrum. We 
interpret this signal as the bispectrum of the clustered 
CIB, and we discuss the implications of this signal in 
Section O 

5.2. Results of model fits 

Having established that the bispectrum data in each 
band contain significant detections of a power-law com- 
ponent and a flat-in-Z component, we move on to fitting 
these data to the model described in Section 14.11 us- 
ing the fitting procedure described in Section 14.21 As 
described in Section |4.2.1[ the linear least-squares fit is 
repeated many times with different realizations of sys- 
tematic uncertainties, drawn from distributions summa- 
rized in Table [I] or, in the case of the instrument beam 
uncertainties, using beam realizations described in Sec- 
tion 14. 2. 11 To minimize uncertainty in interpreting the 



tSZ result due to uncertainties in the assumed halo mass 
function, we repeat the fit using data in which all clus- 
ters above M 2 oo(Pcrit) = 8 x 10 14 M Q /h masked and a 
tSZ model template with the mass function truncated 
at that value. To determine how much of the tSZ bi- 
spectrum is coming from clusters not already used for 
cosmological constraints from cluster count analyses, we 
repeat the fit using data and model templates with no 
clusters above M 2 oo(/o C rit) = 3 x 10 14 M Q //i (see Section 
[4X21 for details). 

The results of the fit with no clusters masked are shown 
in Figure [1] and summarized in Table [21 The results of 
the fit with the two levels of cluster masking and mass 
function truncation are summarized in Tables [3] and [4[ 
The best-fit parameter values using the nominal values 
of the beam and other sources of systematic uncertainty 
(see Section 14.2.1)) arc shown in the tables, along with 
lcr statistical uncertainties (from the covariance matrix 
in the linear least-squares fit), lcr systematic uncertain- 
ties (from the scatter in best-fit parameter values over 
1000 realizations of systematic uncertainties), and the 
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quadrature sum of the two la uncertainties. The un- 
certainties on each parameter are the square root of the 
diagonal of the covariance matrix, i.e., the uncertainty 
of each individual parameter marginalized over the oth- 
ers. The parameter correlation matrix (statistical-only 
and statistical-plus-systematic) for the fit results with 
no cluster masking is shown in Table [SJ Full statistical 
and systematic error covariance matrices are download- 
able from the SPT website, as are the tSZ bispectrum 
templates and the 1000 beam realizations used in the 
fits. 

5.2.1. Best-fit thermal SZ amplitudes 

We discuss the cosmological implications of our tSZ 
bispectrum measurement in Section 16.3) here we briefly 
discuss the best-fit amplitudes at the three different 
masking levels, and we compare the best-fit amplitude 
with no masking to the measurement of the tSZ real- 
space three-point fu nction (skewness) in ACT data from 
IWilson et all d20H . 

First, we note that the best-fit amplitudes at each 
masking level (no clusters masked, clusters above 8 x 
10 14 Mq / h masked, clusters above 3 x 1O 14 M0 / h masked) 
relative to the theoretical prediction for that level of 
masking are statistically consistent with one another and 
indicate a lower tSZ bispectrum amplitude than pre- 
dicted by the fiducial model. The implications of this 
result are discussed in Section lOl The model predicts 
a tSZ bispectrum amplitude at I = 3000 and 152.8 GHz 
(the tSZ-weighted band center of the SPT 150 GHz band) 
of -9.8, -7.5, and -2.4 x lO" 11 /^ 3 for the three mask- 
ing levels. The best-fit results from Tables [2]|4] therefore 
translate to 150 GHz tSZ amplitudes of —5.3, —4.5, and 
— 1.7 x 10~ n /iK 3 for the three masking levels. We note 
that roughly 1/3 of the total signal is coming from clus- 
ters below th e mass threshold used for the cosmological 
constraints in lReichardt et all ()2013f ). implying that cos- 
mological constraints from the tSZ bispectrum do con- 
tain information beyond what is already measured using 
cluster counts. 

To compare our Fourier-domain three-point function 
(i.e., bispectrum) tSZ amplitude to the real-space three- 
point fu nction (skewn e ss) of the tSZ measured in ACT 
data by IWilson et al.l (12012ft. we first take th e /-space 
filter shown in Figure 1 of IWilson etafl (pOll and cal- 
culate the real-space skewness that should be observed 
in a map convolved with this filter, if our tSZ bispectrum 
template is correct. To calculate this, we create a three- 
dimensional bispect rum filter from the one-dimensional 
IWilson et all l|2012T ) filter (by defining F hisp (fa,fa,fa) = 
Fi d (fa)F ld (fa)F ld (fa)) 7 multiply the predicted tSZ bi- 
spectrum by t his three-dimensiona l filter , and calculate 
(T 3 ) following iKomatsu fc Spergel (|200l : 

( 32 ) 

hhh v 7 

Fbis P (h, fa, fa)B{l\,fa, fa)- 

To evaluate the Wigncr 3-j symbol, we use the high-/ 
approximation in Equation 8 of B12. 

The predicted skewness from our no-masking tSZ bi- 
spectrum t emplate mult i plied by the bispectrum ver- 
sion of the IWilson et all (|2012ft filter is -53.3 ^K 3 at 



152.8 GHz. At the ACT band center of 148 GHz, the 
template and filter predict —64.6 [iK 3 . Given the ampli- 
tude we measure (in no-cluster-masked data) relative to 
the theoretical prediction, and assuming the shape of the 
bispectrum theory template is correct, our bispectrum 
measurement corresponds to a real-space tSZ skewness 
at 148 GHz of -34.9 ± 3.3 ^K 3 , or -34.9 ± 7.8 /jK 3 
when we add the 20% sample variance uncertainty es- 
timated in Section 16.3.11 This is consistent with the 
value of —31 ± 14 fjK 3 (also including samp le variance 
uncertainty) measured in lWilson et al.l (|2012[) . Given the 
detection in this work of a sig nificant cluste r ed CI B bi- 
spectrum, it is likely that the IWilson et all (|2012f > tSZ 
skewness measurement is biased low by ~15% (see Sec- 
tion [1T2J] for details). If this is i ndeed the case , then the 
true central value measured by IWilson et al.l (|2012f ) is 
more like 36 /xK 3 , still consistent with the tSZ bispectrum 
measurement in this work. 

5.2.2. x 2 an d goodness-of-fit values 

The x 2 values from the fits using the three levels of 
cluster masking and the nominal values of beams and 
other sources of systematic uncertainty are summarized 
in Table |U The table includes values of absolute x 2 , 
reduced x 2 , an d Ax 2 between the best fit and the null 
hypothesis. The formal probabilities to exceed (PTEs) 
associated with the reduced x 2 f° r a U three levels of 
cluster masking are vanishingly small, but a small un- 
derestimate of the bispectrum variance would cause an 
otherwise good fit to have such a PTE. Since the x 2 of 
the bispectrum fit will scale as the amplitude of the map 
noise and simulated Gaussian sky signal to the —6 power, 
the observed x 2 excess is consistent with a percent-level 
misestimate in the noise or the Gaussian sky amplitude. 

The map noise used to estimate the bispectrum vari- 
ance is taken from the same data used to construct the 
real maps used to measure the bispectrum (see Section 
13.41 for details). Thus, we can calculate a x 2 from the 
"measured" bispectrum of every simulated map and use 
the scatter in x 2 across the simulations as a measure of 
how closely the estimated bispectrum variance from map 
noise should match the map-noise variance contribution 
to the real data. None of the 100 simulations had x 2 as 
high as the data, so it is unlikely that the excess x 2 is 
due to a map noise misestimate. On the other hand, it 
is plausible that the Gaussian sky amplitudes could be 
mismatched between the simulations and the data at the 
1% level. Our estimate for the amplitude of CMB fluc- 
tuations in SPT maps of this 800 deg 2 region is limited 
by cosmic variance and the uncertainty on our absolute 
calibration (which is 1-2% in temperature, see Section 
I4.2.1|) . while our estimate for the Poisson point-source 
power is limited by calibration and beam uncertainties 
and by the lack of high-precision measurements of the 
Poisson amplitude at millimeter wavelengths. 

Alternatively, the excess x 2 could be evidence of depar- 
tures from our models for either tSZ or clustered CIB. 
However, the total Ax 2 between the null hypothesis of 
zero bispectrum and the best-fit model is smaller than 
the difference between the x 2 of the best-fit model and a 
X 2 that would reduce to 1.0. Misestimates of the beam 
or filter transfer function could also be responsible. We 
can test this hypothesis by examining the x 2 values for 
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each systematic realization, and we do not find any trend 
of x 2 with beam realizations; in fact, the total spread in 
X 2 across all systematic realizations is roughly ±10, in- 
dicating that none of the identified sources of systematic 
uncertainty are responsible for the excess x 2 ■ Finally, the 
excess x 2 1S n °t strongly concentrated in one frequency 
band or region of Z-space. This points to a slight un- 
derestimate in the simulated Gaussian sky signal as the 
source of the excess x 2 ■ 

6. DISCUSSION 

In this section, we discuss the implications of each bi- 
spectrum component measured in the multi-band fit. We 
begin by comparing the amplitude of the Poisson point- 
source bispectrum in each band to model predictions; we 
then discuss the clustered CIB amplitude, both as an 
interesting signal in its own right and as a possible con- 
taminant to the tSZ amplitude; finally, we implement the 
analysis introduced in B12 and use the tSZ bispectrum 
amplitude constraint to measure as and to sharpen the 
kSZ amplitude measurement from R12. 

6.1. Poisson point-source component amplitudes 
vs. model predictions 

Given a model for the number of sources in a given 
flux interval per unit solid angle dN/dS/dQ, we can pre- 
dict the contribution to the bispectrum from the Poisson 
component of those sources. Wc can then compare these 
predictions to the results in Tables [20 as a test of the 
source models. Because the Poisson contribution to the 
bispectrum is weighted by the individual source fluxes 
cubed — compared to the source-flux-squared weighting 
in the power spectrum — this test is largely independent 
of power-spectrum-based tests of source models. And, 
because the bispectrum in this work is calculated with 
all sources detected above 5ct masked, the bispectrum 
constraints on models are nearly inde pendent of source- 
count constraints from the same data (jVieira et ai]|2010l 
L. Mocanu et al., in prep.). 

In Table [7l we show the predicted bispectrum 
amplitude in all three SP T bands from two mod- 
els of rad i o-loud sources (|De Zotti et al.l 120101 and 
iTucci etahl 120 111), two models of r adio-quiet dusty 
sourc es ( Bethermin et al] 120111 and IBethermin et al.l 
I2012I ). and the four possible combinations of these mod- 
els. We also repeat the measured values of the Poisson 
bispectrum component from Tables [2]|4] for comparison. 
In some cases, the model predictions are at the nom- 
inal SPT bands, in others, the predictions are for the 
analogous Planck bands; in either case, we transform 
the models to the appropriate effective SPT band cen- 
ter for that source family, using assumed spectral indices 
of «radio = —0.5 and ctrinstv = 3.5, consistent with the 
results of iVieira et al.l (|2010t) and R12. We also use this 
assumed spectral behavior to transform the 150 GHz flux 
cut to the other two bands. 

Two things are immediately clear from Table El The 
first is that only in the 150 GHz band is the bispectrum 
expected to contain a significant contribution from both 
families of sources: at 95 GHz, the dusty sources are 
expected to contribute < 5% of the total amplitude, 
while at 220 GHz, the radio-loud sources are expected 
to contribute < 1% of the total amplitude. The second 



is that, while the IBethermin et al.l (|2011f ) model predic- 
tion is within l-2er of the measurement at 220 GHz, there 
are significant differences between the model predictions 
and the measured amplitudes in all other cases. 

Wc first investigate whether this discrepancy between 
measured and predicted Poisson bispectrum amplitudes 
could be due to effects that are not included in the 
measured uncertainty, in particular sample variance and 
the effect of a varying flux cut. Near the ~6 mJy (at 
150 GHz) flux cut used in this work, t he dependence 
of radio source counts on flux is sha llow (iDe Zotti et al.l 
[20T0t IVieira et al.ll20Tot ITucci et al.ll2011t L. Mocanu et 
al., in preparation). This means that the radio-source bi- 
spectrum will be dominated by the brightest (and rarest) 
sources just below the flux cut. This will tend to make 
the radio source bispectrum more sensitive to sample 
variance and to the fact that, while the real flux cut 
used in this work varies from field to field by ^10%, we 
calculate the predicted bispectrum from source models 
using a single flux cut. We estimate the magnitude of 
both of these effects by simulated observations of many 
800-deg 2 patches of sky containing sources drawn from 
the source count models listed in Table [7] In one set 
of simulated observations, we use the nominal 6 mJy 
150 GHz source cut in every field; in another set, we use 
the actual 150 GHz source cut used in each individual 
field in this work; these cut levels range from 5.7 to 6.6 
mJy. Both the bispectrum sample variance (calculated 
as the scatter among the best-fit Poisson bispectrum in 
all simulated observations) and the effect of the different 
flux cuts were largest at 95 GHz — not surprising, given 
that this is the band in which the radio source contribu- 
tion is largest — but even in that band, the square root of 
the sample variance was only 2% of the average Poisson 
bispectrum, and the difference between using the true 
flux cut for every field and using the nominal flux cut 
was only 6%. We conclude that the discrepancy between 
model predictions and the measured Poisson bispectrum 
cannot be explained by sample variance and varying flux 
cuts. 

The simplest modifications to the source models that 
would bring them in line with the bispectrum results in 
this work would be: 1) to steepen the spectral behav- 
ior used to extrapolate the dusty source behavior from 
higher frequencies to the SPT bands, thus reducing the 
dusty-source bispectrum by a small amount at 220 GHz 
and a larger amount at 150 GHz; and 2) to assume a 
slightly shallower spectral index in extrapolating counts 
at radio frequencies to 95 GHz. It remains to be seen 
whether such modifications would be compatible with 
constraints from other measures of point-source behav- 
ior such as number counts and the power spectrum. An 
interesting possibility for future work is to combine these 
probes into a simultaneous constraint on source models. 

6.2. The clustered CIB bispectrum 

Measurements of the two-point function of CIB clus- 
tering (either the real-space two-point angular correla- 
tion function or the angular power spectrum) are cur- 
rently providing key constraints on the relationship be- 
tween star-forming galaxies and their dark-matter halos, 
or, equivalcntly, on the relationship betwee n luminosity 
and m ass in star-forming galaxies (see, e.g.. IViero et alj 
120121 and references therein). Equally interesting will 
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TABLE 2 

Multi-band fit results, No cluster masking 



Template 


best-fit 


noise 


systematic 


quadrature 


detection 


constraint 




value 


error (lcr) 


error (lcr) 


sum 


significance 


significance 


tSZ, relative to analytical prediction 


0.54 


0.04 


0.03 


0.05 


13 


10 


Clustered CIB [10" VK 3 at I = 2000 and v = 220 GHz] 


0.68 


0.13 


0.06 


0.14 


5.2 


4.7 


Poisson, 95 GHz, [10~ 1( VK 3 ] 


1.54 


0.16 


0.09 


0.18 


9.7 


8.5 


Poisson, 150 GHz, [lO -11 ^ 3 ] 


1.14 


0.09 


0.09 


0.13 


12 


9.0 


Poisson, 220 GHz, [lO -10 ^ 3 ] 


1.89 


0.26 


0.24 


0.35 


7.2 


5.3 



Parameter best-fit values and lcr statistical and systematic uncertainties. "Detection significance" refers to the best-fit parameter value 
divided by statistical uncertainty only; "constraint significance" refers to the best-fit parameter value divided by the q uadrature sum 
of statistical and systematic uncertainty. Sample variance is not included in the constraint significance; see Section 16.31 and Table [8] for 
tSZ results including sample variance. 



TABLE 3 

Multi-band fit results, M2oo(pcrit) > 8 x 10 14 Mq/Ii clusters masked 



Template 


best-fit 


noise 


systematic 


quadrature 


detection 


constraint 




value 


error (lcr) 


error (lcr) 


sum 


significance 


significance 


tSZ, relative to analytical prediction 


0.60 


0.05 


0.04 


0.07 


11 


9.2 


Clustered CIB [10" 9 /iK 3 at I = 2000 and u = 220 GHz] 


0.75 


0.13 


0.07 


0.15 


5.6 


5.0 


Poisson, 95 GHz, [lO -10 /^ 3 ] 


1.63 


0.16 


0.10 


0.19 


10 


8.8 


Poisson, 150 GHz, [lO" 11 ^ 3 ] 


1.28 


0.10 


0.10 


0.14 


13 


9.3 


Poisson, 220 GHz, [lO -10 /^ 3 ] 


1.78 


0.26 


0.21 


0.34 


6.8 


5.3 



See Table [2] for notes. 



TABLE 4 

Multi-band fit results, M2oo(p C rit) > 3 x 10 14 Mq/Ii clusters masked 



Template 


best-fit 
value 


noise 
error (lcr) 


systematic 
error (lcr) 


quadrature 
sum 


detection 
significance 


constraint 
significance 


tSZ, relative to analytical prediction 


0.71 


0.17 


0.06 


0.18 


4.2 


3.9 


Clustered CIB [10~ 9 /iK 3 at I = 2000 and u = 220 GHz] 


0.75 


0.13 


0.07 


0.15 


5.6 


5.0 


Poisson, 95 GHz, [lO -10 /* 3 ] 


1.68 


0.16 


0.11 


0.19 


10 


8.7 


Poisson, 150 GHz, [10 _1 VK 3 ] 


1.30 


0.10 


0.10 


0.14 


13 


9.2 


Poisson, 220 GHz, [10 _1( VK 3 ] 


1.71 


0.27 


0.21 


0.34 


6.4 


5.1 



See Table [2] for notes. 



TABLE 5 TABLE 6 

Parameter correlation matrices for multi-band fits X 2 F0R multi-band fits 



Stat, only 

tSZ CIB P95 P150 P220 

tSZ 1.00 0.32 0.39 0.17 -0.28 

CIB 0.32 1.00 0.12 -0.61 -0.88 

P95 0.39 0.12 1.00 0.07 -0.11 

P150 0.17 -0.61 0.07 1.00 0.54 

P220 -0.28 -0.88 -0.11 0.54 1.00 

Stat. + syst. 

tSZ CIB P95 P150 P220 

tSZ 1.00 0.36 0.55 0.35 0.03 

CIB 0.36 1.00 0.20 -0.35 -0.41 

P95 0.55 0.20 1.00 0.23 0.06 

P150 0.35 -0.35 0.23 1.00 0.32 

P220 0.03 -0.41 0.06 0.32 1.00 



Correlation between parameters for the multi-band fit with no 
clusters masked. The top table shows the normalized elements 
of the parameter covariance matrix = C^,^/ ^C^pUZZ for 
statistical covariance only; the bottom table shows the same 
quantities f or the sum of statistical and systematic covariance 
(see Section l4,2.1l for details on the calculation of systematic co- 
variance). The parameter labels refer to thermal SZ amplitude, 
clustered CIB amplitude, and Poisson point source amplitudes 
in each of the three bands. 

be constraints on this relationship from the clustered 
CIB bispectrum. As is the case for tSZ and the Poisson 
point-source component, the relative weighting of sources 
that contribute to the clustered CIB power spectrum 



Cluster masking \^ x 2 > null hypothesis Ax 2 xj a d 

None 51697.9 52752.5 -1054.6 TTo 

8xl0 14 Af Q //i 51644.3 52761.7 -1117.4 1.10 

3 x lO 14 M /fe 50853.9 52068.7 -1214.8 1.08 

X 2 for simultaneous fits to bispectrum data in all three bands 
with three levels of cluster masking. Also shown is the x 2 f° r 
the null hypothesis, the A\ 2 between the null hypothesis and 
the full model, and the reduced x 2 for the full model (for 47,005 
degrees of freedom). 

and bispectrum will be different — with the bispectrum 
generally sensitive to brighter sources because of the S 3 
weighting — implying that the two probes can give inde- 
pendent constraints on models of the mass-luminosity 
relationship. 

There are currently no physically motivated predic- 
tions for the clustered C IB bispectrum in the literature — 
although lLacasal (|2012f ) present a heuristic model based 
on power spectrum measurements, which we adopt as 
our fitting template. However, any model of the mass- 
luminosity relationship of star-forming galaxies that can 
predict the clustered CIB power spectrum should also be 
able to predict the bispectrum, and we expect the mea- 
surement of the clustered CIB bispectrum in this work 
to provide new constraints on such models. 

For now, our main conclusions about the clustered CIB 
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TABLE 7 

POISSON SOURCE COMPONENT RESULTS VS. MODEL PREDICTIONS 



Measured Poisson bispectrum amplitudes 








Masking level 


95 GHz value 


150 GHz value 


220 GHz value 




[10- 10 AtK 3 ] 


[io- 11 ^ 3 ] 


[10- 10 mK 3 ] 


No cluster masking 


1.54 ±0.18 


1.14 ±0.13 


1.89 ±0.35 


-^200 (Pcrit ) > 8 X 10 14 Mq//i clusters masked 


1.63 ±0.19 


1.28 ±0.14 


1.78 ±0.34 


M20O (Pcrit ) > 3 X 10 1 M(?)/h clusters masked 


1.68 ±0.19 


1.30 ±0.14 


1.71 ±0.34 


Model Predictions 








Model 


95 GHz value 


150 GHz value 


220 GHz value 




[10- 10 AtK 3 ] 


[io^Vk 3 ] 


[W- 10 fiK 3 ] 


de Zotti radio 


0.685 


0.563 


0.018 


Tucci radio 


0.484 


0.385 


0.011 


Bethermin (2011) dusty 


0.023 


3.285 


2.264 


Bethermin (2012) dusty 


0.016 


1.878 


2.900 


de Zotti + Bethermin (2011) 


0.708 


3.849 


2.282 


de Zotti + Bethermin (2012) 


0.701 


2.441 


2.918 


Tucci + Bethermin (2011) 


0.507 


3.670 


2.275 


Tucci + Bethermin (2012) 


0.500 


2.263 


2.911 



Comparison of measured single-band Poisson point-source bispectrum amplitudes with predictions from source count models. In the 
upper section, measured values of the Poisson source-component bispectrum amplitudes — and ltr statistical-plus-systcmatic errors on 
those values — are shown for the three levels of cl uster masking. In th e lo wer section, pred icted Poisson bispectrum amplitudes are 
shown for two models of radio -lou d source counts { De Zotti et al. 2010 and Tucci et al. 201l|), two models of dusty, radio-quiet source 
counts (Bethermin ct al. 2011 and Bethermin ct al. 20121). and combinations thereof. 



bispectrum are that it is clearly detectable in 800 deg 2 of 
220 GHz data at SPT noise levels, and that the angular 
shape of the signal is fit reasonably well b y a pure power 
law. Our model, based on the ansatz of lLacasal ()2012l ) 
and described in Section [4.1.31 has £? c iust('i, h, h) oc 
jn/2jn/2jn/2 ^ n = \ 2 and scales with observing fre- 
quency as f 3a , with a = 3.72. As shown in Figure[TJ this 
is by eye a reasonable fit to the data. Although the PTE 
associated with the reduced x 2 is vanishingly small, as 
discussed in Section I5.2.2[ this is consistent with a very 
small noise misestimate. Neither changing the power-law 
index of the angular shape of the signal nor changing the 
assumed frequency scaling results in significant improve- 
ments in x 2 (I Ax 2 1 < 2 for a 2er shift in a or a 30% 
change in the power- law index). For the fiducial model, 
using the results of the no-cluster-masking multi-band 
fit, the amplitude of the clustered CIB bispectrum at 
I = 3000 and 219.6 GHz (the CIB-weighted band center 
of the SPT 220 GHz band) is (3.22 ±0.68) x 10" 10 /^K 3 . 
Using the > 8 x 10 14 M Q //i masking result, the ampli- 
tude of the clustered CIB bispectrum at I — 3000 and 
219.6 GHz is (3.52 ±0.70) x 10" 10 fiK 3 . 

6.2.1. The clustered CIB as a contaminant to the thermal 
SZ bispectrum 

The clustered CIB bispectrum that we detect in the 
SPT 220 GHz band will also contribute to the bi- 
spectrum at 150 GHz (and, to a much lesser extent, 
at 95 GHz). With the assumed frequency scaling of 
B oc ^ 3ct , with a = 3.72, the ratio of clustered CIB bi- 
spectrum amplitude in the 150 and 95 GHz bands com- 
pared to that in the 220 GHz band should be 0.031 and 
0.001, respectively, meaning we would expect roughly 
1 x 10" 11 fiK 3 of clustered CIB bispectrum at I = 3000 
and 150 GHz (compared to an expected tSZ bispectrum 
of —5.4 x 10~ n /uK 3 ) and a negligible contribution of 
< 10~ 12 ^K 3 at 95 GHz. This implies that, if we were to 
neglect the clustered CIB, we would underestimate the 
tSZ bispectrum amplitude by roughly 20% at 150 GHz 



(because the bispectrum shape of the tSZ and clustered 
CIB are similar, but with different polarities). If we fit 
both the 95 and 150 GHz bispectra individually to a 
two-component model consisting of tSZ and a Poisson 
point-source term, the results are as expected. The best- 
fit tSZ amplitude with no clusters cut from the 95 GHz 
data alone (£?tsz = 0.54 ± 0.07) is consistent with the 
multi-band fit results (B tSZ = 0.54 ± 0.04, cf. Table EJ, 
but the best-fit tSZ amplitude from the 150 GHz data 
alone (B tSZ = 0.43 ±0.05) is -20% lower than the multi- 
band result. 

The multi-band fit properly accounts for this, and if 
the CIB behavior were very different from what we as- 
sume in the model, this would manifest in a noticeably 
poorer \ 2 in the multi-band fit relative to single-band 
fits, which we do not see. In particular, if there were a 
significant level of tSZ-CIB correlation in the bispectrum, 
we would expect to see a very different best-fit tSZ am- 
plitude from the multi-band fit from what we obtain 
with the 95 GHz-only fit; in fact, our cosmological con- 
straints detailed below would not substantively change 
if we used only 95 GHz data in the fit, although the er- 
ror bars would increase somewhat. We conclude that, at 
the current level of statistical precision, we see no evi- 
dence that our model of the CIB is inadequate or that 
the CIB is significantly biasing our measurement of the 
tSZ bispectrum. More complicated models involving spa- 
tial correlations between the sources of the tSZ and CIB 
bispectra, different CIB bispectrum shapes, and spectral 
behavior beyond a single spectral index will be explored 
in future analyses which include measurements of the 
cross-bispectra among the three SPT bands (in addition 
to the auto-bispectra considered here). 

As noted in Section 15.2. 1[ according to our model 
an d fit results, the 148 GHz tSZ skewness measurement 
of iWilson etaLl ([20121 ) in ACT data is likely to be bi- 
ased low by roughly 15% (less than the 20% we see at 
152.6 GHz, the CIB-weighted band center of the SPT 
150 GHz band, due to the very steep frequency scaling 
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of the CIB). This is slightly smaller than the statisti- 
cal + sample variance uncertainty in that result — and 
significantly smaller than the difference in tSZ skcwness 
predicted by the range of ICM models they consider. 

6.3. Cosmological interpretation of the thermal SZ 
bispectrum amplitude 

In this section, we use our measurement of -Btsz, the 
tSZ bispectrum amplitude, to place a constraint on erg 
and to predict the tSZ power spectrum amplitude, A t sz- 
We use this A t sz prediction to break degeneracies be- 
tween tSZ and kSZ in measurements of the CMB power 
spectrum. First, however, we need to estimate two prop- 
erties of the Atsz and -Btsz distributions, namely the 
sample variance of -Btsz and the covariance of A t sz with 
-Btsz over the same patch of sky. 

6.3.1. Sample variance in the measurement of Btsz 

To estimate the sample variance contribution to our 
measurement of -Btsz, we use a set of cosmological sim- 
ulations. These simulations use the same gas physics 
prescription, gas physics parameter settings, and cosmo- 
logical parameter settings that went into the template 
predictions used in the model fitting procedure described 
in Section 14.21 The simulations cover an octant of sky, 
from which we extract 40 independent 100 deg 2 fields. 
We run the bispectrum estimator over these fields with 
the same weighting used in running the estimator on the 
150 GHz data. We fit each of the resulting 40 bispectrum 
measurements to the predicted template, again using the 
weights from the 150 GHz data and restricting the fit to 
I < 4000 to roughly account for the effects of the Pois- 
son point-source component in the fit to the data. The 
calculation is performed with ten levels of cluster mask- 
ing, ranging from no masking to masking clusters above 
M 2 00(Pcrit) > 2 x 10 14 Af Q //i. 

We then estimate the scatter in 800 deg 2 regions for 
each cluster masking level by grouping the 40 ampli- 
tudes into five independent groups of eight amplitudes, 
averaging each group, and calculating the scatter among 
groups. This is a noisy estimate of the sample variance. 
In particular, the sample variance as a function of mask 
threshold is affected by the masking of individual clus- 
ters at high enough mass that only a few such clusters 
exist in the entire octant simulation. For this reason, 
we fit a smooth function to the measured sample vari- 
ance as a function of masking, and we report our sam- 
ple variance as the best-fit value at the three masking 
levels used for the data. The fractional scatter in B t sz 
due to sample variance is estimated to be 0.20, 0.15, 
and 0.06 at the three levels of cluster masking (none, 
> 8x 10 14 M Q /h, > 3x 10 14 M o //i) used for the data. We 
note that the value for the no-masking case is consistent 
wi th the sample varian ce of the tSZ skewness measured 
bv lWilson et all (|201%_ given the relativ e sky coverage 
of the two analyses. (Wilson et al.l 120121 measured 41% 
scatter for 239 deg 2 as compared to 20% for 837 deg 2 in 
this work.) 

As discussed in Section l4.1.2l the sample variance and 
statistical uncertainty on -B t sz go in opposite directions 
as more clusters are masked, implying that there is an op- 
timal mass threshold, at which level the total statistical 
+ systematic + sample -variance error on Btsz is min- 
imized. Table [8] shows the fractional uncertainty from 



each of these sources (and quadrature combinations of 
them) for the three masking levels used for the data. 
Among these three masking levels, the total uncertainty 
is smallest when clusters above 8x 10 14 M Q /h are masked. 
We interpolate between these masking values, using the 
smooth fit to the sample variance discussed above and 
scaling the fractional statistical error by the best-fit -B t sz 
in the simulations at each of the ten masking levels used 
in the sample variance calculation. This calculation im- 
plies that the total fractional scatter would be smallest 
with a cut at 6 x 10 14 Mq//i, but that the fractional scat- 
ter at this cut would be ~18%, as compared to ~19% 
at 8 x 10 14 M©//i. This gain in signal-to-noise is mini- 
mal and probably below the fidelity of our calculation, 
so we use the 8 x 10 14 Mq//i cut values of B t sz for our 
cosmological results. 

6.3.2. Covariance between Atsz and Btsz 

We estimate the covariance b etween Atsz and -Btsz 
using the halo- model approach of lKavo et al.l (|2013| ). to- 
gether with the gas physics prescription of B12. We find 
that the square root of the fractional covariance between 
Atsz with no clusters cut and -B t sz with clusters above 
8 x 1O 14 M //i cut is ^0.06. This is small compared 
to the other sources of uncertainty in our prediction of 
Atsz, and we ignore it for this analysis. Details of the 
A-tsz/Btsz covariance calculation are given in the Ap- 
pendix. 

6.3.3. A (78 constraint from the thermal SZ bispectrum 

Taking into account the full uncertainty (statistical 
+ systematic + sample variance) on our measurement 
of B t sz, and adding an additional 33% theory uncer- 
tainty as estimated in B12, we use Equation 18 of B12 to 
transform our B t sz measurement into a constraint on 
og- We use the best- fit B t sz with all clusters above 
^20o(Pcrit) = 8 x 10 14 M Q //i masked, and we use the 
sample variance estimated for that mass cut. The scal- 
ing of Btsz with cosmological parameters changes slightly 
when the most massive clusters are not included in the 
data and the model prediction. For a six-parameter flat 
ACDM model, we find the cosmological scaling of B t sz 
with clusters above 8 x 10 14 M Q /h cut to be 

Btsz (M 20 oOcrit) < 8 x 10 14 M & /h) (33) 

/^y i / nb y-« 2 / h y- 2 s 

K V0.8/ \0.0A5J \0.7l) 

(j^y 1 - 12 f jM~°- ar 

X V0.97/ VO-27/ 

(with no measurable dependence on t). Compared to 
the no-cut scaling in B12, the primary difference is a 
slightly shallower scaling with <7s(-Btsz oc o-g 1 - 6 in B12). 
We marginalize over all cosmological parameters other 
than erg. Although the dependence of B t sz on erg is 
far stronger than on the other parameters, the depen- 
dence on fib is stro ng enough that we place a WMAP7 
(lLarson et al.ll201lD prior on Vt^h 2 and a prior on h from 
iRiess et al.l (|201li r The resulting constraint on as is 

cr 8 = 0.786 ± 0.031. (34) 

The uncertainty on our determination of <7s is domi- 
nated by the assumed 33% modeling uncertainty. Given 
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TABLE 8 

Thermal SZ amplitude constraints including sample variance 



Masking 


best- fit 


fractional 


fractional 


stat. + 


fractional sample- 


stat. + syst. + 


tSZ amplitude 


stat. error 


syst. error 


syst. 


variance error 


sample- variance 


No cluster masking 


0.54 


0.08 


0.06 


0.09 


0.20 


0.22 


A^20o(Pcrit) > 8 X 10 14 MQ/h clusters masked 


0.60 


0.09 


0.06 


0.11 


0.15 


0.19 


(Pcrit) > 3 X 10 14 Mq/H clusters masked 


0.71 


0.24 


0.09 


0.25 


0.06 


0.26 



Best-fit values for the tSZ amplitude and fractional la statistical, systematic, and sample-varianc e unce rtainties on that quantity (and 
combinations of these in quadrature). Sample- variance errors are estimated as described in Section l6.3.1l Values are given for each of the 
three cluster mask thresholds (no masking, clusters with M200 (Pcrit ) > 8 xlO 14 Mq/H masked, clusters with M2oo(Pcrit) > 3xlO 14 M0/h 
masked). Best- fit amplitudes are given relative to the theoretical prcdiciton for that masking level. 




Fig. 2. — One-dimensional posterior probability distributions from R12 for A t gz (left) and A^gz (right), in the case in which the spatial 
cor relation between tSZ and CIB was a free parameter in the R12 fits, before and after applying the bispectrum-bascd prior in Equation 
1371 The black (solid) line shows constraints with no bispectrum information added; the blue (dotted) line shows the constraints assuming 
the realistic 8% modeling uncertainty in A t gz for fixed B t szi the red (dashed) line shows constraints assuming the extreme 15% modeling 
uncertainty. In all cases, adding constraints from the bispectrum data improves the power-spectrum-only constraints. 



the steep scaling of erg with -Btsz and the mild depen- 
dence on other parameters, in the absence of modeling 
uncertainty, we would expect to achieve a ~2% con- 
straint on tT8 from our 19% constraint on i?tsz, compared 
to the 4% constraint achieved when modeling uncertainty 
is included. 

This constraint on erg from the tSZ bispectrum is 
comparable in significance to, and statistically consis- 
tent with, other recent determinations of er 8 from tSZ 
and/or the primary C MB. From the prima ry CMB, in 
a flat ACDM model, IHinshaw et"all (|2012l ) find cr 8 = 
821± 0.023from WMAP9 data alone, while lStorv etaTl 
(|2012D find erg = 0.795 ± 0.022 when adding SPT con- 
straints from the primary CMB damping tail to WMAP7 
data. Adding constraints from the tSZ power spec- 
trum to WMAP7 plu s earlier damping-tail constraints, 
IShirokoff et~aT1 (|2011D obtain a constraint on er g with sta- 
tistical precision at the ±0.01 level, but which varies in 
best-fit value from 0.77 < erg < 0.80 depending on th e 
theory template used. Simlarly, iDunklev et al.l (|2011l ). 
using only tSZ power spectrum data, obtain a ±0.05 
constraint (statistical only) but find best-fit values from 
0.74 < eg < 0.79, depending on the choice of theory tem- 
plate. Adding SPT cluster counts to WMAP7 and the 
iKeisler et al.l (|2011l ) SPT measurement of the damping 
tail, and marginalizing over uncertainty in the X-ray- 
calibr at ed tSZ-mass sca l ing re lation from iBenson et al.l 
IpoTl . IReichardt etUI (pi! find cr 8 = 0.798 ± 0.017. 



Combining WMAP7 with ACT-detccted clusters and 
marginalizing over uncert ainty in a dynamical-m ass- 
calibrated scaling relation, rHassclfic ld et al.l (|2013f ) find 
cr 8 = 0.829 ± 0.024. 

Finally, in the analys is most direct l y com parable to 
the one presented here, I Wilson et al.l ()2012[ ) find erg = 
0.79 ± 0.03 from a measuremen t of the tSZ r eal-sp ace 
three-point function (skewness). iWilson et al.l (|2012f ) do 
not explicitly marginalize over theory uncertainty, but 
they obtain erg constraints using different gas model pre- 
scriptions and find that, even for the extreme case of 
turning off cooling, feedback, and star formation, the 
constraint on erg changes by less than la. The agreement 
between the SPT and ACT constraints on erg from the 
tSZ three-point function is not surprising, given the con- 
sistency between the measured Fourier-domain and real- 
space three-point amplitudes discussed in Section 15.2.11 
This consistency is worthy of note, however, given the 
very different analysis techniques leading to the two con- 
straints and the different regions of the sky measured. 

6.3.4. Predicting A t sz and sharpening A^sz 

Using the approach of B12, we now convert our con- 
straint on Btsz, the amplitude of the tSZ bispectrum, 
into a prediction for A t sz, the amplitude of the tSZ power 
spectrum. We then use that prediction to sharpen the 
R12 constraint on A^sz, the amplitude of the kSZ power 
spectrum. 
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We do not apply any cluster cut to the bispcctrum- 
derived prediction for A t sz or to the measurement of 
Atsz from SPT power spectrum data (which we take 
directly from R12). As detailed in B12 (and refer- 
ences therein), the tSZ power spectrum is dominated 
by lower-mass halos, so the mass-function uncertainties 
at the high-mass end are not as important for the tSZ 
power spectrum as they are for the tSZ bispectrum. 
We do, however, find a slightly different scaling be- 
tween v4 t sz(no cut) and B t sz(cut above 8 x 10 14 M Q /h) 
than between A t sz(no cut) and -B t sz(no cut). Specifi- 
cally, B12 found 

-Btsz (no cut) oc Aj sz (no cut), (35) 

while we find 

B tsz (cut above 8 x 10 14 Af Q //i) oc A^ 3 (no cut). (36) 

We also find a slightly different uncertainty in our 
prediction of A t sz(no cut) given B t sz(cut above 8 x 
1O 14 M //i), namely 8% for realistic gas physics param- 
eters and 15% for the extreme case (as compared to 7% 
and 15% for A t $z at fixed B t sz with no cluster cut for 
either quantity). 

R12 report A t sz and Aj<sz in terms of the power (when 
expressed as D t =1(1 + l)/(2n)Ci) at I = 3000. Using 
the best-fit bispectrum tSZ amplitude measurement with 
clusters above 8 x 10 14 Mq/H masked (including statisti- 
cal, systematic, and sample- variance uncertainties), the 
prediction for A t sz using the realistic 8% modeling un- 
certainty is 

A tsz (predicted from B tsz ) = 2.96 ± 0.59 /iK 2 . (37) 

Increasing the modeling uncertainty to 15% increases this 
error bar to ±0.70 fiK 2 . 

We create a Gaussian prior from this prediction and 
use it to importance-sample the posterior probability dis- 
tributions from R12. In that work, A t sz and A^sz were 
estimated using two different assumptions about the spa- 
tial correlation between tSZ and CIB: 1) assuming zero 
correlation; 2) assuming a single correlation coefficient 
independent of angular scale and allowing that coefficient 
to be a free parameter. If we importance-sample the zero- 
correlation result from R12, the improvement from the 
bispectrum prior is small. In the case of tSZ-CIB correla- 
tion as a free parameter, however, the bispectrum prior 
reduces the tSZ uncertainty by nearly a factor of two 
and results in a posterior Aksz distribution with a clear 
non-zero peak (see Figure [U right panel). There is some 
modeling inconsistency in using the Btsz constraint from 
this work, which is derived assuming no tSZ-CIB corre- 
lation, to improve the R12 A t sz constraint derived with 
tSZ-CIB correlation as a free parameter. However, as de- 
tailed in Section El. 1.31 we expect tSZ-CIB correlation to 
affect the bispectrum far less than the power spectrum, 
such that we can ignore the effects of tSZ-CIB correlation 
on our measurement of Btsz- 

The bispectrum-informcd constraints on A t sz and 
^ksz from R12, with t SZ-CIB correlatio n as a free pa- 
rameter (and using the lShaw et al.ll2012l cooling + star 



formation template for kSZ), are 

Atsz = 3.09 ± 0.52 fiK 2 (38) 
A kSZ = 2.9±1.5 nK 2 
A ksz <5.5 /iK 2 (95%). 

These results are fairly insensitive to modeling uncer- 
tainty: using the extreme 15% modeling uncertainty in- 
stead of the more realistic 8% value increases the la er- 
ror on Atsz to ±0.60 /iK 2 and the la error on Aksz to 
±1.6 fjK 2 . 

Before adding the bispectrum constraint, the R12 la 
uncertainty on A t sz was 1.05 /iK 2 , the 95% upper limit 
on Aksz was 6.7 /iK 2 , and the peak of the Aksz dis- 
tribution was within la of zero. The addition of the 
bispectrum constraint reduces the error of A t sz by a fac- 
tor of two compared to the power spectrum constraints 
alone. In turn, this improves the constraints on Akg Z , 
reducing the upper limit by 20%, and providing nearly 
2a evidence for non-zero kSZ. 

6.3.5. Prospects for the full 2500 deg 2 survey 

The full SPT-SZ survey comprises 2500 deg 2 of 95, 
150, and 220 GHz data at noise levels comparable to the 
800 deg 2 subset used in this analysis, and work is ongoing 
to produce the data and simulation products necessary to 
measure the small-angular-scale power spectrum and bi- 
spectrum in the full survey. The statistical uncertainty 
and the sample-variance uncertainty on £?tsz from the 
full survey should be roughly a factor of a/3 lower than 
the corresponding values in this work, simply from the 
larger sky coverage. The systematic uncertainty is not 
expected to change, but in the 800 deg 2 result, the sta- 
tistical + systematic + sample variance uncertainty is 
dominated by the sample variance contribution in both 
the no-cluster- masking and > 8 x 10 14 M Q //i masking 
cases; this total uncertainty is also expected to decrease 
by nearly V3. For the > 8 x 10 14 M Q //i masking case, 
this would result in a ~12% constraint on -Btsz- 

Because the constraint on as from -Btsz is already lim- 
ited by the assumed 33% modeling uncertainty, we do not 
expect a measurable improvement in the ag constraints 
from the full 2500 deg 2 survey, unless significant progress 
is made in measuring pressure profiles of the clusters re- 
sponsible for the tSZ bispectrum. To achieve a lower sta- 
tistical + systematic + sample variance uncertainty using 
the 2500 deg 2 result, we would need to reduce the theory 
uncertainty by roughly a factor of two. This is an ambi- 
tious goal; however, the amount of X-ray and millimeter- 
wave data on high-mass clusters at all redshifts is increas- 
ing rapidly, with X-ray programs such as Chandra obser- 
vations of 80 SPT-discovered clusters at 0.4 < z < 1.2 
(B. Benson et al., in prep.) and millimeter-wave pressure 
profile measuremen ts from such instruments as Bolocam 
(|Savers et al.| [2012). the Combined Array for Researc h 
in Millimete r-wave Astronomy (e.g., iPlagge et al.ll2012T) . 
and Planck (jPlanck Collaboration et al.ll2012l )~ 

Even with no improvement in modeling uncertainty, 
the 2500 deg 2 measurement of -B t sz will improve our 
ability to separate A t sz and Aksz in the power spec- 
trum. Because the relationship of A t sz to -B t sz is con- 
strained far better than either one individually, our cur- 
rent bispectrum-derived constraint on A t sz is limited by 
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sample variance. Reducing the full statistical + system- 
atic + sample variance uncertainty on B t sz to ~12% 
will result in uncertainties of ~ 0.4 /^K 2 on A t sz and 
~ 1.0 /iK 2 on Aksz, assuming 8% modeling uncertainty. 
If the current best-fit value of A^sz turns out to be cor- 
rect, these constraints will result in nearly a 3er detection 
of the kSZ effect. The addition of 100 deg 2 of already 
collected Herschel-SPIKE submillimeter data (program 
OTl_jcarls01_3, PI: Carlstrom) will provide strong con- 
straints on the behavior of the CIB, which in turn will 
further tighten the tSZ and kSZ constraints. Full-survey 
SPT power spectrum + full-survey SPT bispectrum + 
100 deg 2 SPIRE constraints on the kSZ arc expected to 
be at the < 0.5 /xK 2 level. These constraints will lead to 
unprecedente d limits on the re ionization history of the 
universe (e.g., IZahn et al J 120121 ) . 

7. CONCLUSIONS 

We have used 800 deg 2 of multi-frequency data from 
the SPT-SZ survey to make a high-significance detec- 
tion of the Fourier-domain angular three-point func- 
tion, or angular bispectrum, of the small-angular-scale 
(1' S £ 10', 1000 < I < 10,000) millimeter-wave sky. 
A bispectrum signal model that includes contributions 
from the thermal SZ effect, the clustered cosmic infrared 
background, and the spatially uncorrelated (or Poisson) 
point-source signal in each of the three bands provides 
a reasonable fit to the data. The tSZ bispectrum is de- 
tected at >10cr, the Poisson point-source component is 
detected in each band individually at ^6 to ^12cr, and 
the clustered CIB bispectrum is detected at >5er. This 
is the first detection of the clustered CIB bispectrum. 

We have compared the measured Poisson point-source 
bispectrum in each band to predictions from source mod- 
els. We find that no combination of models of radio-loud 
and dusty, radio-quiet sources can reproduce the mea- 
sured Poisson bispectrum amplitudes, implying that bi- 
spectrum measurements can provide interesting new con- 
straints on source models. 

Applying the m ethod s originally presented in 
iBhattacharva et all (|2012l . B12), we have used the 
measurement of -Btsz, the amplitude of the tSZ bi- 
spectrum to constrain erg, the normalization of the 
matter power spectrum, and to predict A t sz, the 
amplitude of the tSZ power spectrum. The constraint 
on o"8 using just SPT bispectrum data and a prior on 
fib is o"8 = 0.786 ± 0.031. This constraint is competitive 



with, and statistically consistent with, other recent 
measurements. Our bispectrum-derived prediction for 
At?,7, i combined with the power spectrum results of 
iReichardt et al.1 (|2012l . R12), results in the first evidence 
for a non-zero kinematic SZ power spectrum, with 
A kSZ = 2.9 1± 1.5 fiK 2 . 

In addition to constraining cosmology and models of 
source emission, these measurements of the small-scale, 
secondary-anisotropy- and foreground-dominated bispec- 
tra provide valuable constraints on potential contamina- 
tion to measurements of the primordial CMB bispectrum 
on larger scales, such as those expected soon from the 
Planck team. 
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APPENDIX 

In this work, we predict the tSZ power spectrum amplitude from the measured tSZ bispectrum amplitude, and 
we combine that prediction with the R12 measurement of the total SZ power spectrum. Our analysis includes the 
measurement uncertainties of the power spectrum and the bispectrum, but we assume that the covariance between 
the two observablcs is negligible. Here we compute the covariance between the power spectrum and the bispectrum 
and show that our assumption is justified. 

The covariance between the bispectrum (B) and the power spectrum (C) is given by (|Kavo et al J 1 20131) 

Cov[C(l 4 )B(h,l 2 , l 3 )} = 6 hh p- C(h)B(h,l 2 , l 3 ) + 2 perms. + -j- / ^T 5 (l 4 , -l A , h,l 2 , l 3 ; 0), (1) 

where £l s is the survey area, T5 is the tSZ five-point function, and </> is the angle between I4 and l\. Since we use the 
combined measurements of the power spectrum and the bispectrum at a similar I range, we compute the covariance 
for the case when 1 4 = li. 

We calculate the fractional covariance, -\/|Cov(^ ra d)|/|i3(/rad)C(Z ra d)|, where Z r ad is defined in Equation [29l and 
Cov(C-B) and B are contracted from three dimensions to one dimension as in Equations [30] and 1311 The results arc 
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Fig. 3. — The square root of the fractional covariance between the tSZ bispectrum amplitude B and the tSZ power spectrum amplitude 
C as a function of I for the different mass cuts. The three-dimensional quantities Cov[C(h)B(h , I2, 13)] and -B(Zi , £2 , '3) are contracted 
to one dimension as described in the text. From top to bottom, this quantity is plotted for no cluster cut, M2oo(Pcrit) > 8 X 10 14 Mq/}i 
clusters cut, and Af20o(Pcrit) > 3 X 10 14 M^/h clusters cut. 

shown in Figure[3]for the three different mass cuts used in the bispectrum estimate (no cut, Af2oo (Pcrit) > 8 x 1O 14 M / h 
clusters cut, and Af20o(pcrit) > 3 x 1O 14 M //i clusters cut). For a given mass cut, at lower I, the covariance increases 
while the power spectrum and bispectrum decrease slightly. Hence the fractional covariance increases at lower I. At 
higher /, the power spectrum and the bispectrum decrease, with the bispectrum decreasing slightly faster than the 
covariance. This is because the last term in Equation 1 of the Appendix drops quickly at large I, while the first 3 
terms stay non-zero (as they are the product of the bispectrum and the power spectrum) . The net result is that at 
large I, the ratio increases again. The minimum of the ratio appears at / ~ 3000 as both the power spectrum and 
the bispectrum peak in that range. The covariance decreases sharply with mass cut. This occurs because, with more 
clusters masked out, the last term of Equation 1 drops quickly. As shown in Figure |3l the fractional covariance is « 6% 
at I ~ 3000 for the mass cut M2oo(/Ocrit) = 8 x 10 14 M Q //i. This is small compared to the other sources of uncertainty 
in the A t sz-B t sz calcuation. 
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